Combine Project Report

Project Report

On

Speech compression and decompression

Submitted in the partial fulfillment of the requirement for the award of degree of

Bachelors of Technology

In

Electronics & Communication Engineering

Under The Guidance of Submitted by

Miss. Lovleen kaur pankaj singh negi (10807965)

saurabh lohani(10808634)

Department of Electronics & Comm. Engg

Lovely Professional University

Phagwara–140 401, Punjab (India)

Ref:__________ Dated: 27/04/2012

Certificate

Certified that this project entitled “speech compression and decompression” submitted by saurabh lo-hani (10808634), pankaj singh negi (10807965) students of Electronics & Communication Engineering Department, Lovely Professional University, Phagwara Punjab in the partial fulfillment of the require-ment for the award of Bachelors of Technology (Electronics & Communication Engineering) Degree of LPU, is a record of student’s own study carried under my supervision & guidance.

This report has not been submitted to any other university or institution for the award of any degree.

Name of mentor

Miss. Lovleen kaur

Acknowledgement

We would like to express our deep sense of gratitude and indebtedness to Miss. Lovleen kaur mam who guided us at all stages in the preparation of this dissertation. This project would no have been possible without her valuable suggestion and encouragement. It would not be out of place to mention here that my revered parents have always been a great source of inspiration to me. My head bows in obeisance to them.

We are highly appreciative of all others who directly or indirectly contributed to its completion. Last but not the least all that I am capable of doing I owe to THE ALMIGHTY.

Saurabh lohani(10808634)

Pankaj singh negi (10807965)

Abstract

The objective of the project is to develop speech compression and decompression sys-tem using ADSP-2105/2115 processor. It is proposed to employ ADPCM (Adaptive Pulse Code Modulation) for Compression and Decompression.The analog speech signal is digitized by sampling. For maintaining the voice quality, each sample has to be represented by 13 or 16 bits. In compression technique the digitized samples are represented by an equivalent 4 to 8 bit samples. In decompression the compressed samples are expanded back to original sample size and converted back to analog signals. the main focus of any speech recognition system (SRS) is to facilitate and improve the direct audible man- machine communication and provide an al-ternative access to machines, a speech compression system (SCS) focuses on reducing the amount of redundant data while preserving the integrity of signals. The compression of speech signals has many practical applications. One example is in digital cellular technology where many users share the same frequency bandwidth. Compression allows more users to share the system than otherwise possible. Another example is in digital voice storage (e.g. answering machines). For a given memory size, compression allows longer messages to be stored than otherwise.

TABLE OF CONTENTS

1. Introduction………………………………...................................................................

2. Speech representation…………………………………………………………………

3. compression and decompression algorithms………………………………………………

4. Differential pulse code modulation(DPCM)………………………………………………..

5. Adaptive differential pulse code modulation(ADPCM)……………………………………

6. hardware requirements………………………………………………………………..

7. Functioning………………………………………………………………………………….

8. Application…………………………………………………………………………………..

9. Software implementation…………………………………………………………………….

10. MATLAB source code……………………………………………………………………….

11. Sampled speech signal…………………………………………………………………………

12. Calculating threshold…………………………………………………………………………..

13. voice ,unvoiced and mixed speech frames…………………………………………………..

14. Performance measure

1. signal to noise ratio

2.peak signal to noise ratio

3. normalized root mean square

4. retained signal energy

5. compression ratios

15. Future work………………………………………………………………………………………..

1. Enhancing quality

2. Improving compression ratio

16. conclusion……………………………………………………………………………………………

Introduction

Speech is a very basic way for humans to convey information to one another. With a band-width of only 4kHz, speech can convey information with the emotion of a human voice. People want to be able to hear someone.s voice from anywhere in the world . as if the person was in the same room.As a result a greater emphasis is being placed on the design of new and efficient speech coders for voice communication and transmission. Today applications of speech coding and compression have become very numerous. Many applications involve the real time coding of speech signals, for use in mobile satellite communications, cellular telephony, and audio for videophones or video teleconferencing systems. Other applications include the storage of speech for speech synthesis and playback, or for the transmission of voice at a later time. Some examples include voice mail systems, voice memo wristwatches, voice logging recorders and interactive PC software. Traditionally speech coders can be classified into two categories: waveform coders and analysis/synthesis vocoders (from .voice coders.). Waveform coders at-tempt to copy the actual shape of the signal produced by the microphone and its associated analogue circuits [9]. A popular waveform coding technique is pulse code modulation (PCM), which is used in telephony today. Vocoders use an entirely different approach to speech cod-ing, known as parametercoding, or analysis/synthesis coding where no attempt is made at re-producing the exact speech waveform at the receiver, only a signal perceptually equivalent to it. These systems provide much lower data rates by using a functional model of the human speaking mechanism at the receiver. One of the most popular techniques for analysissynthesis coding of speech is called Linear Predictive Coding (LPC). Some higher quality vocoders in-clude RELP (Residual Excited Linear Prediction) and CELP (Code Excited Linear Prediction) . This project looks at a new technique for analysing and compressing speech signals using wavelets. Very simply wavelets are mathematical functions of finite duration with an average value of zero that are useful in representing data or other functions. Any signal can be repre-sented by a set of scaled and translated versions of a basic function called the .mother wavelet.. This set of wavelet functions forms the wavelet coefficients at different scales and positions and results from taking the wavelet ransform of the original signal. The coefficients represent the signal in the wavelet domain and all data operations can be performed using just the corre-sponding wavelet coefficients. Speech is a non-stationary random process due to the time vary-ing nature of the human speech production system. Non-stationary signals are characterised by numerous transitory drifts, trends and abrupt changes. The localisation feature of wavelets, along with its time-frequency resolution properties makes them well suited for coding speech signals.

SPEECH REPRESENTATIONS

Extracting information from a speech signal to be used in a recognition engine or for compres-sion purposes relies usually on transforming such a signal to a different domain than its origi-nal state. Although, processing a signal in the time domain can be beneficial to obtain measures such as zero crossing and others, most important properties of the signal resides in the time-fre-quency and time-scale domains. Thissection contains a review and a comparison of the differ-ent methods and techniques that allow such extractions. In this paper, x(t) represents the con-tinuous speech signal to be analyzed. In order to digitally process a signal x(t), it has tobe sampled at a certain rate. 20000 Hz is a standard sampling frequency for the Digits and the English alphabets in. To make the distinction in the representation with the digitized signals, the latter is referred to as x(m). Most speech processing schemes assume slow changes in the properties of speech with time, usually every 10-30 milliseconds. This assumption influenced the creation of short time processing, which suggests the processing of speech in short but peri-odic segments called analysis frames or just frames. Each frame is then represented by one or a set of numbers, and the speech signal has then a new time-dependent representation. In many speech recognition systems like the ones introduced in, frames of size 200 samples and a sam-pling rate of 8000 Hz (i.e., 200 ∗ 1000/8000 = 25 milliseconds) are considered. This segmenta-tion is not error free since it creates blocking effects that makes a rough transition in the repre-sentation (or measurements) of two consecutive frames. To remedy this rough transition, a window is usually applied to data of twice the size of the frame and overlapping 50% the con-secutive analysis window. This multiplication of the frame data by a window favors the sam-ples near the center of the window over those at the ends resulting into a smooth representa-tion. If the window length is not too long, the signal properties inside it remains constant. Tak-ing the Fourier Transform of the data samples in the window after adjusting their length to a power of 2, so one can apply the Fast Fourier Transform , results in time-dependent Fourier transform which reveals the frequency domain properties of the signal .The spectrogram is the plot estimate of the short-term frequency content of the signals in which a three-dimensional representation of the speech intensity, in different frequency bands, over time is portrayed . The vertical dimension corresponds to frequency and the horizontal dimension to time. The darkness of the pattern is proportional to the energy of the signal. The resonance frequencies of the vocal tract appear as dark bands in the spectrogram . Mathematically, the spectrogram of a speech signal is the magnitude square of the Short Time Fourier Transform of that signal . In the literature one can find many different windows that can be applied to the frames of speech signals for a short-term frequency analysis. Three of them are depicted in Figure

Compression and Decompression Algorithms

The simplest way to realize, for example, a voice recorder is to store the A/D conversion re-sults (e.g., 12-bit samples) directly in the flash memory. Most of the time, the audio data does not use the complete A/D converter range, which means that redundant information is stored in the flash memory. Compression algorithms remove this redundant information, thereby reduc-ing the data that must be stored. Adaptive differential pulse code modulation (ADPCM) is such a compression algorithm. Various ADPCM algorithms exist, and differential coding and adap-tation of the step-size of the quantizer scheme is common to all. Before taking a closer look at the IMA ADPCM algorithm, which is used in the associated code, a short description of the differential PCM coding is given.

Differential Pulse Code Modulation (DPCM)

DPCM encodes the analog audio input signal using the difference between the current and the previous sample. Figure 1 shows a DPCM encoder and decoder block diagram. In this exam-ple, the signal difference, d(n), is determined using a signal estimate, Se(n), rather than the pre-vious input. This ensures that the encoder uses the same information available to the decoder. If the true previous input sample were used by the encoder, an accumulation of quantization er-rors could occur. This leads to a drift of the reconstructed signal from the original input signal. By using a signal estimate as shown in Figure 1, the reconstructed signal, Sr(n), can be pre-vented from drifting from the original input signal. The reconstructed signal, Sr(n), is the input to the predictor, which determines the next signal estimate, Se(n+1).

Figure 2 shows a small part of a recorded audio stream. Analog audio input samples (PCM val-ues) and the differences between successive samples (DPCM values) are compared in the two diagrams in Figure 2. The range of the PCM values is between 26 and 203, with a delta of 177 steps. The encoded DPCM values are within a range of –44 and 46, with a delta of 90 steps. Despite a quantizer step size of one, this DPCM encoding already shows a compression of the input data. The range of the encoded DPCM values could be further decreased by selecting a higher quantizer step size.

Adaptive Differential Pulse Code Modulation (ADPCM)

ADPCM is a variant of DPCM that varies the quantization step size. Amplitude variations of speech input signals are seen between different speakers or between voiced and unvoiced seg-ments of the speech input signal. The adaptation of the quantizer step size takes place every sample and ensures equal encoding efficiency for both low and high input signal amplitudes. Figure 3 shows the modified DPCM block diagram including the step-size adaptation.

RS-232LEVEL

CONVERTER

PC

RAM

OPTIONAL HARDWARE

LATCH

BUFFER

RS-232BUS

SYSTEM BU

S

EPROM

MIC

AMPLIFIERCODEC

ADSP

2105/2115

RESET

INTERRUPT

CLOCK

SERIALPORT

LPORT

SPEAKER

The ADPCM encoder calculates the signal estimate, (Se), by decoding the ADPCM code. This means that the decoder is part of an ADPCM encoder. Hence, the encoded audio data stream can only be replayed using the decoder. This means that the decoder must track the encoder.The initial encoder and decoder signal estimate level, as well as the step-size adaptation level, must be defined before starting encoding or decoding. Otherwise, the encoded or decoded value could exceed the scale.

HARDWARE

The objective of the project is to develop speech compression and decompression sys-tem using ADSP-2105/2115 processor. It is proposed to employ ADPCM (Adaptive Pulse Code Modulation) for Compression and Decompression. The analog speech signal is digitized by sampling. For maintaining the voice quality, each sample has to be represented by 13 or 16 bits. In compression technique the digitized samples are represented by an equivalent 4 to 8 bit samples. In decompression the compressed samples are expanded back to original sample size and converted back to analog signals. The Hardware consists of DSP processor ADSP 2105/2115 as CPU, CODEC, EPROM RAM, Amplifier sections, Mic and speaker. The CODEC has been interfaced to ADSP processor through its serial port. The optional hardware include PC serial port interface consisting of serial I/O port and RS-232 level converter .The TTL logic levels of serial port are converted to RS232 level using level converter, so that the system can directly communicate with the standard serial port (com1/com2) of personal com-puter.

BLOCK DIAGRAM OF SPEECH COMPRESSION AND DECOMPRESSION USING ADSP-2105/2115

FUNCTIONING:

The system is operated through the reset and interrupt switch. Once the system is re-setted it will be ready to accept the speech signals through the Mic and CODEC. The analog speech signals are amplified by the pre-amplifier and fed to the CODEC for analog to digital conversion .The CODEC transmits the digitized signal to the ADSP 2105/2115 processor, which then compress the speech data using the ADPCM techniques and store in RAM. When the processor is interrupted, it reads the compressed data from RAM expands the data and send the data to CODEC. The CODEC converts the digital data to analog signal, which is amplified and output through the speaker.

APPLICATION:

The speech compression and decompression techniques are implemented in the applica-tions like :-

1. Cellular phones2. Voice mail transmission

3. Speech recognition system4. Voice storage 5. IVRS(Interactive Voice Response System)

Software Implementation

In this project we are making use of MATLAB software to Implement the function and working of this project.MATLAB stands for Matrix Laboratory. According to The Math-works, its producer, it is a "technical computing environment". We will take the more mundane view that it is a programming language. This section covers much of the language, but by no means all. We aspire to at the least to promote a reasonable proficiency in reading procedures that we will write in the language but choose to address this material to those who wish to use our procedures and write their own programs. MATLAB is one of a few languages in which each variable is a matrix (broadly construed) and "knows" how big it is. Moreover, the funda-mental operators (e.g. addition, multiplication) are programmed to deal with matrices when re-quired. And the MATLAB environment handles much of the bothersome housekeeping that makes all this possible. Since so many of the procedures required for Macro-Investment Analy-sis involve matrices, MATLAB proves to be an extremely efficient language for both commu-nication and implementation MATLAB is a programming environment for algorithm develop-ment, data analysis, visualization, and numerical computation. Using MATLAB, you can solve technical computing problems faster than with traditional programming languages, such as C,

C++, and Fortran. You can use MATLAB in a wide range of applications, including signal and image processing, communications, control design, test and measurement, financial modeling and analysis, and computational biology. For a million engineers and scientists in industry and academia, MATLAB is the language of technical computing.

Matlab source code:-

close all; clear alldisp('load speech data');load speech.dat;lg=length(speech);t=[0:1:lg-1]/8000;disp('loading finished');disp('mulaw companding') nspeech=speech/(2^15); % 15 bits mu=input('input mu =>');for x=1:lg munspeech(x) =mulaw(nspeech(x),1,mu); %mulaw compressionenddisp('finished mu-law companding');disp('start to quantization')bits = input('input bits=>');for x=1:lg [pq uindx(x)]= midtread(bits,1,nspeech(x)); [pq muindx(x)]= midtread(bits,1,munspeech(x));end%%% transmission%disp('expander');for x=1:lg qunspeech(x)= mtrdec(bits,1,uindx(x)); qmunspeech(x)=mtrdec(bits,1,muindx(x));endfor x=1:lg expnspeech(x)= muexpand(qmunspeech(x),1,mu);end quspeech=qunspeech.*2^15; qspeech =expnspeech.*2^15;disp('finished')

qerr = speech-qspeech;subplot(2,1,1),plot(t, speech, 'w', t, qspeech, 'c',t,qspeech-speech,'r');gridsubplot(2,1,2),plot(t, speech, 'w', t, quspeech,'b',t,quspeech-speech,'r');grid

disp('speech:orginal data 15 bits');disp('quspeech: PCM in quantized');disp('qspeech: mulow deccoded');disp('SNR speech and qspeech'); snr(speech,qspeech);disp('SNR speech quspeech');snr(speech,quspeech);function qvalue = mulaw(vin, vmax, mu)

vin = vin/vmax;

qvalue = vmax*sign(vin)*log(1+mu*abs(vin))/log(1+mu);

function rvalue = muexpand(y,vmax, mu)

y=y/vmax;

rvalue=sign(y)*(vmax/mu)*((1+mu)^abs(y) -1);

function [ pq, indx ] = midtread(NoBits,Xmax,value)

% function [pq indx] = midtread(NoBits, Xmax, value)

% this routine is created for simulation of uniform quatizer.

%

% NoBits: number of bits used in quantization.

% Xmax: overload value.

% value: input to be quantized.

% pq: output of quantized value

% indx: codeword integer

% Note: the midtread method is used in this quantizer.

%

if NoBits == 0

pq = 0;

indx=0;

else

delta = 2*abs(Xmax)/(2^NoBits-1);

Xrmax=delta*(2^NoBits/2-1);

if abs(value) >= Xrmax

tmp = Xrmax;

indx=(2^NoBits/2-1);

else

tmp = abs(value);

end

indx=round(tmp/delta);

pq =round(tmp/delta)*delta;

if value < 0

pq = -pq;

indx=-indx;

end

end

function pq = mtrdec(NoBits,Xmax,indx)

% function pq = mtrdec(NoBits, Xmax, indx)

% this routine is created for simulation of uniform quatizer.

%

% NoBits: number of bits used in quantization.

% Xmax: overload value

% pq: output of quantized value

% indx: codeword integer

% Note: the midtread method is used in this quantizer.

%

if NoBits == 0

pq = 0;

else

delta = 2*abs(Xmax)/(2^NoBits-1);

pq=indx*delta;

end

function snr = calcsnr(speech, qspeech)

function snr = calcsnr(speech, qspeech)

% this routine is created for calculation of SNR

% speech: original speech waveform.

% qspeech: quantized speech.% snr: output SNR in dB.% % Note: midrise method is used in this quantizer.% qerr = speech-qspeech; snr = 10*log10(sum(speech.*speech)/sum(qerr.*qerr))

% Waveform coding using DCT and MDCT for a block size of 16 samples

% main program

close all; clear all

load speech.dat % provided by the instructor

% create scale factors

N=16; % block size

scalef4bits=sqrt(2*N)*[1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768];

scalef3bits=sqrt(2*N)*[256 512 1024 2048 4096 8192 16384 32768];

scalef2bits=sqrt(2*N)*[4096 8192 16384 32768];

scalef1bit=sqrt(2*N)*[16384 32768];

scalef=scalef2bits;

nbits =3;

% ensure the block size to be 16 samples.

x=[speech zeros(1,16-mod(length(speech),16))];

Nblock=length(x)/16;

DCT_code=[]; scale_code=[];

% encoder

for i=1:Nblock

xblock_DCT=dct(x((i-1)*16+1:i*16));

diff=(scalef-(max(abs(xblock_DCT))));

iscale(i)=min(find(diff=min(diff(find(diff>=0))))); %find a scale factor

xblock_DCT=xblock_DCT/scalef(iscale(i)); % scale the input vector

for j=1:16

[DCT_coeff(j) pp]=biquant(nbits,-1,1,xblock_DCT(j));

end

DCT_code=[DCT_code DCT_coeff ];

end

%decoder

Nblock=length(DCT_code)/16;

xx=[];

for i=1:Nblock

DCT_coefR=DCT_code((i-1)*16+1:i*16);

for j=1:16

xrblock_DCT(j)=biqtdec(nbits,-1,1,DCT_coefR(j));

end

xrblock=idct(xrblock_DCT.*scalef(iscale(i)));

xx=[xx xrblock];

end

% Transform coding using MDCT

xm=[zeros(1,8) speech zeros(1,8-mod(length(speech),8)), zeros(1,8)];

Nsubblock=length(x)/8;

MDCT_code=[];

% encoder

for i=1:Nsubblock

xsubblock_DCT=wmdct(xm((i-1)*8+1:(i+1)*8));

diff=(scalef-max(abs(xsubblock_DCT)));

iscale(i)= iscale(i)=min(find(diff=min(diff(find(diff>=0))))); %find a scale factor

xsubblock_DCT=xsubblock_DCT/scalef(iscale(i)); % scale the input vector

for j=1:8

[MDCT_coeff(j) pp]=biquant(nbits,-1,1,xsubblock_DCT(j));

end

MDCT_code=[MDCT_code MDCT_coeff];

end

%decoder

% recover thr first subblock

Nsubblock=length(MDCT_code)/8;

xxm=[];

MDCT_coeffR=MDCT_code(1:8);

for j=1:8

xmrblock_DCT(j)=biqtdec(nbits,-1,1,MDCT_coeffR(j));

end

xmrblock=wimdct(xmrblock_DCT*scalef(iscale(1)));

xxr_pre=xmrblock(9:16) % recovered first block for overlap and add

for i=2:Nsubblock

MDCT_coeffR=MDCT_code((i-1)*8+1:i*8);

for j=1:8

xmrblock_DCT(j)=biqtdec(nbits,-1,1,MDCT_coeffR(j));

end

xmrblock=wimdct(xmrblock_DCT*scalef(iscale(i)));

xxr_cur=xxr_pre+xmrblock(1:8); % overlap and add

xxm=[xxm xxr_cur];

xxr_pre=xmrblock(9:16); % set for the next overlap

end

subplot(3,1,1);plot(x,'k');grid; axis([0 length(x) -10000 10000])

ylabel('Original signal');

subplot(3,1,2);plot(xx,'k');grid;axis([0 length(xx) -10000 10000]);

ylabel('DCT coding')

subplot(3,1,3);plot(xxm,'k');grid;axis([0 length(xxm) -10000 10000]);

ylabel('W-MDCT coding');

xlabel('Sample number');

function [ tdac_coef ] = wmdct(ipsig)

%

% This function transforms the signal vector using the W-MDCT

% usage:

% ipsig: inpput signal block of N samples (N=even number)

% tdac_coe: W-MDCT coefficents (N/2 coefficients)

%

N = length(ipsig);

NN =N;

for i=1:NN

h(i) = sin((pi/NN)*(i-1+0.5));

end

for k=1:N/2

tdac_coef(k) = 0.0;

for n=1:N

tdac_coef(k) = tdac_coef(k) + ...

h(n)*ipsig(n)*cos((2*pi/N)*(k-1+0.5)*(n-1+0.5+N/4));

end

end

tdac_coef=2*tdac_coef;

function [ opsig ] = wimdct(tdac_coef)

%

% This function transform the W-MDCT coefficients back to the signal

% usage:

% tdac_coeff: N/2 W-MDCT coeffcients

% opsig: output signal black with N samples

%

N = length(tdac_coef);

tmp_coef = ((-1)^(N+1))*tdac_coef(N:-1:1);

tdac_coef = [ tdac_coef tmp_coef];

N = length(tdac_coef);

NN =N;

for i=1:NN

f(i) = sin((pi/NN)*(i-1+0.5));

end

for n=1:N

opsig(n) = 0.0;

for k=1:N

opsig(n) = opsig(n) + ...

tdac_coef(k)*cos((2*pi/N)*(k-1+0.5)*(n-1+0.5+N/4));

end

opsig(n) = opsig(n)*f(n)/N;

end

Openfile.mfunction sdata = openfile(fName);% openfile : function to read a speech file with a .od extension% call syntax: sdata = openfile(fName);% --------------------------------% Read sound file data into a column vectorsdata = dlmread(fName);Play.mfunction play(M);% PLAYFILE: Plays a sound file which is stored as a vector% call syntax: playfile(M);% ------------------------% Play sound filesoundsc(M, 8000, 8);Main.m% Speech Compression Simulation Program% User InputsfileName = ’c:\program files\matlab\work\s180.od’;wavelet = ’db10’;% Compress speech[tC, tL, PZEROS, PNORMEN] = compress(fileName, wavelet);% Decompress speech

rS = decompress(tC,tL, wavelet);

% Performance calculations[SNR, PSNR, NRMSE] = pefcal(fileName, rS);Compress.mfunction [tC, tL, PZEROS, PNORMEN] = compress(fileName, wavelet);% Compress : compresses speech signals wavelet coefficients% Inputs: speech signal file name, wavelet% Outputs: compressed coefficients, length vector, compression score% and retained energy% Call syntax: [tC, tL, PZEROS, PNORMEN] = compress(fileName, wavelet);% --------------------------------------% Initialise other variablesN = 5; % level of decompositionALPHA = 1.5; % compression parameterSORH = ’h’; % hard thresholding% Read speech filesdata = openfile(fileName);% Compute the DWT to level N[C,L] = wavedec(sdata,N,wavelet);% Calculate level dependent thresholds[THR,NKEEP] = lvlThr(C,L,ALPHA);% Compress signal using hard thresholding%[XC,CXC,LXC,PERF0,PERFL2] = Trunc(’lvd’,C,L,wavelet,N,THR,SORH);% Encode coefficientscC = encode(CXC);% Transmitted coefficients;tC = cC;% Transmitted coefficients vector lengthtL = L;% Percentage of zerosPZEROS = PERF0;% Retained energyPNORMEN = PERFL2;% Compression ratio with encodingCompRatio = length(sdata)/length(tC)Decompress.mfunction rSignal = decompress(tC,tL, wavelet);% Decompress : uncompress DWT coefficients and reconstructs signal% Inputs: encoded wavelet coefficients, coeff vector length% Output: reconstructed signal

% Call syntax: rSignal = decompress(tC,tL, wavelet);

% -----------------------------------% Decode coefficientsrC = decode(tC);% Reconstruct signal from coefficients

rSignal=waverec(tC,tL,wavelet);% -----------------------------% Initialise variableszeroseq = ’flse’; % True if previous array entry was a zerozerocount = 0; % Count of no of zeros in sequencej= 1; % Start index value for compressed coefficientscompC = [ ]; % compressed coefficients vector% Start iterating thru arrayfor m=1:length(C)if (C(m) == 0) & (zeroseq == ’flse’) % First zerocompC = [compC C(m)];j = j+1;zeroseq = ’true’;zerocount = 1;% Reached end of array and last value is zeroif m == length(C)compC = [compC zerocount];endelseif (C(m) == 0) & (zeroseq == ’true’) % Sequence of zeroszerocount = zerocount + 1;% Reached end of array and last value is zeroif m == length(C)compC = [compC zerocount];endelseif (C(m) ~= 0) & (zeroseq == ’true’) % End of zeroscompC = [compC zerocount C(m)];j = j+2;zeroseq = ’flse’;zerocount = 0;else % Non-zero entrycompC = [compC C(m)];j = j+1;endend

cC = compC;

Decode.mfunction rC = Decode(cC);% Decode: function to decode consecutive zero valued coefficients% Call syntax: rC = Decode(cC);% ----------------------------% Initialise variablesdcompC = [ ]; % Empty reconstructed coefficients arrayi = 1; % Initial index of loop% Start iterating thru arraywhile i <=length(cC)if cC(i) ~= 0 % Non-zero entrydcompC = [dcompC cC(i)];i = i + 1;else % Zero entrycount = cC(i+1);for m=1:count % Add zerosdcompC = [dcompC 0];endi = i + 2;endend

rC = dcompC;Pefcal.mfunction [SNR, PSNR, NRMSE] = pefcal(fileName, rS);% Pefcal: Peformance Calculations function file% Calculates Signal to Noise Ratio, Peak Signal to Noise Ratio% and Normalized Root Mean Square Error% Get original speech signalorigdata = openfile(fileName);% Resize reconstructed signal for the mathematics to workrS = rS(1:length(origdata));% Signal to Noise Ratiosqdata = origdata.^2; % Square of original speech signalsqrS = rS.^2; % Square of reconstructed signalmsqdata = mean(sqdata); % Mean square of speech signalsqdiff = (sqdata-sqrS); % Square differencemsqdiff = mean(sqdiff); % Mean square differenceSNR = 10*log10(msqdata/msqdiff); % Signal to noise ratio% Peak Signal to Noise RatioN = length(rS); % Length of reconstructed signalX = max(abs(sqdata)); % Maximum absolute square of orig signal

diff = origdata - rS; % Difference signalendiff = (norm(diff))^2; % Energy of the difference between the% original and reconstructed signalPSNR = 10*log10((N*(X^2))/endiff); % Peak Signal to noise ratio% Normalised Root Mean Square Errordiffsq = diff.^2; % Difference squaredmdiffsq = mean(diffsq); % Mean of difference squaredmdata = mean(origdata); % Mean of original speech signalscaledsqS = (origdata - mdata).^2; % Squared scaled datamscaledsqS = mean(scaledsqS); % Mean of squared scaled data

NRMSE = sqrt(mdiffsq/mscaledsqS); % Normalized Root Mean Square ErrorComp.mfunction [tC, tL, PZEROS, PNORMEN, cScore, nFrames] = comp(fileName, wavelet,N, frameSize)% Comp: function simulates real time compression of speech signals% Inputs: speech signal file name, wavelet and frame size% If frame size is 0 no frames are used% Outputs: compressed coefficients and compression ratio% Call Syntax: [tC, tL, PZEROS, PNORMEN, cScore, nFrames] = comp(fileName,wavelet, N, frameSize)% Calculate no of framesfileSize = FileSize(fileName);if frameSize == 0frameSize = fileSize;endnumFrames = ceil(fileSize/frameSize);% Initialise other variables%tC = [ ]; % transmitted coefficients vectortXC = [ ]; % uncompressed coefficients vector%lenOrigC = 0; % length of original coefficientsPERF0V = [ ]; % vector of % truncation for each framePERFL2V = [ ]; % vector of % retained energy for each framefor i=1:numFrames% Read a frame from the speech filesdata = FrameSelect(i,frameSize,fileName, fileSize);% Compute the DWT to level N[C,L] = wavedec(sdata,N,wavelet);% Calculate default thresholds[THR, SORH, KEEPAPP] = gblThr(’cmp’,’wv’,sdata);SORH = ’h’;KEEPAPP = 0; % Can threshold approximation coefficients also

% Compress signal using hard thresholding[XC,CXC,LXC,PERF0,PERFL2]=Trunc(’gbl’,C,L,wavelet,N,THR,SORH,KEEPAPP);% Encode coefficientscC = encode(CXC);% Transmitted coefficientstXC = [tXC cC];% Truncation % VectorPERF0V = [PERF0V PERF0];% Retained Energy VectorPERFL2V = [PERFL2V PERFL2];end

% Return ValuestC = tXC;tL = tXC;PZEROS = mean(PERF0V);PNORMEN = mean(PERFL2V);cScore = fileSize/length(tC);nFrames = numFrames;Decomp.mfunction rSignal = decomp(tC,tL,wavelet,numFrames,frameSize);% Decomp: function simulates real time decoding of signals% Inputs: encoded wavelet coefficients, coeff vector length% Call syntax: rSignal = decompress(tC,tL,numFrames,frameSize);% Outputs: reconstructed signal% Call Syntax: rSignal = decomp(tC,tL,wavelet,numFrames,frameSize);% Initialise other variablesrS = [ ]; % reconstructed signalframeSize = sum(tL)-frameSize; % frame size of DWT coefficients% Decode coefficientsrC = decode(tC);for i=1:numFrames% Range of frameR1 = (i-1)*frameSize + 1;R2 = i*frameSize;% Read coefficients in framefC = rC(R1:R2);% Reconstruct frame signalX = waverec(fC,tL,wavelet);% Total reconstructed signalrS=[rS; X];end

% Return outputrSignal = rS;Filesize.mfunction fSize = FileSize(fName);% FileSize: counts no of samples in a speech file% Call syntax: fSize = FileSize(fName);% ---------------------data = OpenFile(fName);

fSize=length(data);Frameselect.mfunction v = FrameSelect(fNum,fSize,fileName,fileSize);% FrameSelect : reads a frame of data from a speech file into a column vector% call syntax: v = FrameSelect(fNum,fSize,fileName);% --------------------------------% Read the corresponding frame from the sound file into a column vector% range = [R1 C1 R2 C2] C1 = C2 = 0 since only one column% R1 = First Value, R2 = Last ValueR1 = fSize*(fNum-1);R2 = (fSize*fNum - 1);R3 = R2;% Adjust range value for last frameif R2 >= fileSizeR2 = fileSize - 1;endrange = [R1 0 R2 0];v = dlmread(fileName,’’,range);% If data for last frame is smaller than frame size% Zero pad the frameif R3~=R2N = (R3-R2);for i= 1:Nv = [v;0];end

endOptimal.m% Optimal Wavelet For Speech Compression% This script file determines the percentage of Speech Frame Energy% Concentrated by wavelets in the first N/2 Coefficients% ------------------------------------------------------------------% Inputs: speech signal file name, wavelet and frame size% Outputs: compressed coefficients and compression ratio

% User InputsfileName = ’c:\program files\matlab\work\s180.od’;wavelet = ’db10’;frameSize = 160;% Calculate no of framesfileSize = FileSize(fileName);if frameSize == 0frameSize = fileSize;endnumFrames = ceil(fileSize/frameSize);% Vector to Store Retained Energy of Each FramePREV = [ ];% Step thru each frame and calculate Retained Energy%for i=1:numFrames% Read a frame from the speech filesdata = FrameSelect(8,frameSize,fileName, fileSize);% Compute the DWT to level 5[C,L] = wavedec(sdata,5,wavelet);% Calculate Energy Retained in first N/2 CoefficientsxC = C(1:(length(C)/2));RE = 100*(norm(xC))^2/(norm(C))^2;PREV = [PREV ; RE];%end

PREVVoiced.m% Vocied, Unvoiced and Mixed Frames% This script file plots frames% ------------------------------------------------------------------% Inputs: speech signal file name, wavelet and frame size% Outputs: compressed coefficients and compression ratio% User InputsfileName = ’c:\program files\matlab\work\s180.od’;wavelet = ’db10’;frameSize = 1024;% Calculate no of framesfileSize = FileSize(fileName);if frameSize == 0frameSize = fileSize;endnumFrames = ceil(fileSize/frameSize);% Read frame i from the speech file

i = 9;sdata = FrameSelect(i,frameSize,fileName, fileSize);% Compute the DWT to level 5[C,L] = wavedec(sdata,5,wavelet);% Calculate Energy Retained in first N/2 CoefficientsxC = C(1:(length(C)/2));RE = 100*(norm(xC))^2/(norm(C))^2;% Plot frame and wavelet transfrom coefficientssubplot(2,1,1); plot(sdata,’r’); title(’Mixed Speech Segment’);

subplot(2,1,2); plot(C); title(’DWT Coefficients Using Db10 Wavelet’);

Result

Sampled Speech Signals

The sample speech files used for compression are .OD files. These files contain discrete signal values, which can be easily read in and played by Matlab at a sampling frequency of 8 KHz. Alternatively WAV files could also be used and processed. Matlab 6 uses an interpreter to run any code written without actually compiling the code. Due to this it is far to slow to design and implement a real time speech coder in Matlab alone. Thus real time speech coding could only be simulated by dividing the input sample speech into frames of 20 ms (160 samples) and then decomposing and compressing each frame. For processing recorded speech however larger frame sizes can be used. Since the speech files used in this design are of very short duration (few seconds), the entire speech vector could be decomposed without dividing it into frames.

Calculating Thresholds

For the truncation of small-valued transform coefficients, two different thresholding techniques are used, Global Thresholding and By-Level Thresholding. The aim of Global Thresholding is to retain the largest absolute value coefficients, regardless of the scale in the wavelet decompo-sition tree. Global thresholds are calculated by setting the % of coefficients to be truncated. Level dependent thresholds are calculated using the Birge-Massart strategy [15]. This thresh-olding scheme is based on an approximation result from Birge and Massart and is well suited for signal compression. This strategy keeps all of the approximation coefficients at the level of decomposition J.The number of detail coefficients to be kept at level i starting from 1 to J are given by the formula:

α is a compression parameter and its value is typically 1.5. The value of M denotes the how scarcely distributed the wavelet coefficients are in the transform vector. If L denotes the length of the coarsest approximation coefficients then M takes on the values in Table 4.1, depending on the signal being analysed.

Thus this approach to thresholding selects the highest absolute valued coefficients at each level.

Encoding Zero-Valued Coefficients

After zeroing wavelet coefficients with negligible values based on either calculating threshold values or simply selecting a truncation percentage, the transform vector needs to be com-pressed. In this implementation consecutive zero valued coefficients are encoded with two bytes. One byte is used to specify a starting string of zeros and the second byte keeps track of the number of successive zeros. Due to the scarcity of the wavelet representation of the speech signal, this encoding method leads to a higher compression ratios than storing the non-zero co-efficients along with their respective positions in the wavelet transform vector, as suggested in

the Literature Review. This encoding scheme is the primary means of achieving signal com-pression. In Matlab however, the coding of this compression algorithm using vectors results in relatively slow performance, with unacceptable delays for real time speech coding. This encod-ing process can be speeded up significantly by programming it in another language such as C++

Voiced, Unvoiced and Mixed Speech FramesFrom the previous speech signal analysed three different types of speech segments can be iden-tified, based on the amount of energy the wavelet concentrates in the first N/2 coefficients.The figures below show each speech frame with its wavelet decomposition at level 5, using the Daubechies 10 wavelet. The structure of the plotted wavelet transform vector is [cA5, cD5, cD4, cD3, cD2, cD1] and the lengths of the respective coefficients are given by (50, 50, 81, 144, 270, 521). The total length of the coefficients is 1116, which is greater than the frame size of 1024.

Performance Measures:

A number of quantitative parameters can be used to evaluate the performance of thewavelet based speech coder, in terms of both reconstructed signal quality after decodingand compression scores. The following parameters are compared:! Signal to Noise Ratio (SNR),! Peak Signal to Noise Ratio (PSNR),! Normalised Root Mean Square Error (NRMSE),! Retained Signal Energy! Compression Ratios

The results obtained for the above quantities are calculated using the following formulas:1. Signal to Noise Ratio

σ is the mean square of the speech signal and 2 e σ is the mean square difference between the original and reconstructed signals.

Peak Signal to Noise Ratio

N is the length of the reconstructed signal, X is the maximum absolute square value of the sig-nal x and 2 x − r is the energy of the difference between the original and reconstructed signals.

Normalised Root Mean Square Error

x(n) is the speech signal, r(n) is the reconstructed signal, and μx(n) is the mean of the speech signal.

Retained Signal Energy

x( n ) is the norm of the original signal and r( n ) is the norm of the reconstructed signal. For one-dimensional orthogonal wavelets the retained energy is equal to the L2-norm recovery per-formance.

Compression Ratio

cWC is the length of the compressed wavelet transform vector.

Future Work Enhancing QualityListening tests conducted on a male spoken sentence .Cats and dogs each hate the other., using different wavelets revealed that the /s/ sound in the word .dogs. tends to be slightly distorted and if not heard carefully can be mistaken for just the singular term. The /s/ sound is an un-voiced excitation. The transforms for three different frames from this speech signal are ana-lysed in section . Figure sows that for an unvoiced speech frame the wavelet coefficients are spread across all frequency bands. Therefore this frame will not undergo significant compres-sion when a threshold is applied. If a relatively low value is used for the truncation threshold, the reconstructed signal frame will be severely distorted. Thus wavelets are inefficient at cod-ing unvoiced speech frames. By detecting unvoiced speech frames and directly encoding them using some form of bit encoding, like entropy coding, no unvoiced data is lost and there is only a marginal increase in the bit rate . A cheap wavelet with few vanishing moments can be used to detect voiced or unvoiced speech frames, using the technique suggested in the Literature.

Improving Compression RatiosFurther data compaction is possible by exploiting the redundancy in the encoded transform co-efficients. A bit encoding scheme could be used to represent the data more efficiently. A com-mon loss-less coding technique is Entropy coding. Two common entropy coding schemes are Prefix coding and tree-structured Huffman coding. Both these forms of entropy coding require a prior knowledge of the nature of the source data, such as probability distribution of the source output data . In practice however, probabilistic models are usually not known a priori. Thus a model of the data must be constructed from the data set itself.An example of a data compaction code that encodes directly from the data stream without con-structing an explicit model is the Ziv-Lempel code . Ziv-Lempel coding is a universal variable-to-fixed-length data compaction code that is practical, has good performance and does not re-quire an externally constructed source model. The use of such a scheme in the last stage of the wavelet transform speech coder will enable the transmission of voice at low bit rates.

Conclusion:

speech coding is currently an active topic for research in the areas of Very Large Scale Inte-grated (VLSI) circuit technologies and Digital Signal Processing (DSP). The Discrete Wavelet Transform performs very well in the compression of recorded speech signals. For real time speech processing however, its performance is not as good. Therefore for real time speech cod-ing it is recommended to use a wavelet with a small number of vanishing moments at level 5 decomposition or less. The wavelet based compression software designed reaches a signal to noise ratio of 17.45 db at a compression ratio of 3.88 using the Daubechies 10 wavelet. The performance of the wavelet scheme in terms of compression scores and signal quality is com-parable with other good techniques such as code excited linear predictive coding (CELP) for speech, with much less computational burden. In addition, using wavelets the compression ra-tio can be easily varied, while most other compression techniques have fixed compression ratios.

Combine Project Report

Documents

Transcript of Combine Project Report