Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.

Speech Coding

Submitted To: Dr. Mohab MangoudSubmitted By: Nidal Ismail

Outline1. Introduction

Overview of Speech Coding Properties of a Speech Coder Modeling the Speech Production System Linear Prediction

2. Different Coding Techniques Waveform Coders Parametric Coders Hybrid Coders Coding Standards

3. PCM & DPCM

4. Linear Predictive Coding

5. Conclusion

6. References

1. Introduction

Block Diagram of a speech coding system

Sampling Frequency = 8kHzNumber of Bits per sample = 8Bit Rate = 8 . 8kHz = 64 kbps

Overview of Speech Coding

Properties of a Speech Coder

Low Bit-Rate High Speech Quality Robustness Across Different Speakers /

Languages Robustness in the Presence of Channel

Errors Good Performance on Non speech

Signals Low Memory Size and Low Computational

Complexity Low Coding Delay

1. Introduction

Modeling the Speech Production System

1. Introduction

Speech = Voiced + Unvoiced sounds

1. Introduction

Modeling the Speech Production System

Autocorrelation values for the signal frames. Left: Unvoiced. Right: Voiced.

1. Introduction

Modeling the Speech Production System• Signal from a source is

filtered by a time-varying filter with resonant properties similar to that of the vocal tract.

• The gain controls Av and AN determine the intensity of voiced and unvoiced excitation.

• The frequency of higher formant are attenuated by -12 dB/octave (due to the nature of our speech organs).

1. Introduction

Linear Prediction

Linear prediction as system identification.

Linear prediction is a practical method of spectrumestimation, where the PSD can be captured using a few coefficients.

These coefficients or linear prediction coefficients can be used to construct the synthesis filter.

1. Introduction

Linear Prediction

Linear prediction as system identification.

Predicted Signal

Prediction error




3. PCM & DPCM


5. Conclusion

6. References

2. Different Coding Techniques

Waveform Coders Original shape of the signal waveform is

preserved Coders can be applied to any signal source Coders are better suited for high bit-rate

coding, since performance drops sharply with decreasing bit-rate.

In practice, these coders work best at a bit-rate of 32 kbps and higher.

Some examples of this class include various kinds of pulse code modulation (PCM) and adaptive differential PCM (ADPCM)

Parametric Coders

The speech signal is generated from a model, which is controlled by some parameters.

Parameters are estimated from the input speech signal No attempt to preserve the original shape of the

waveform Accuracy and sophistication of the mode account for

the quality. The most successful model is based on linear

prediction. In this approach, the human speech production mechanism is summarized using a time-varying filter ( with the coefficients of the filter found using the linear prediction analysis procedure.)

This class of coders works well for low bit-rate. Bit-rate is in the range of 2 to 5 kbps. Example coders of this class include linear prediction

coding (LPC) and mixed excitation linear prediction (MELP).


Hybrid Coders

Combines the strength of a waveform coder with that of a parametric coder

As in waveform coders, an attempt is made to match the original signal with the decoded signal in the time domain

This class dominates the medium bit-rate coders, with the code-excited linear prediction (CELP) algorithm and its variants the most outstanding representatives

A hybrid coder tends to behave like a waveform coder for high bit-rate, and like a parametric coder at low bit-rate, with fair to good quality for medium bit-rate.



Coding Standards




3. PCM & DPCM


5. Conclusion

6. References

3. PCM & DPCM

Pulse Code Modulation

Invented 1926, deployed 1962.

Basic idea: assign smaller quantization stepsize for small-amplitude regions and larger quantization stepsize for large-amplitude regions (Non-uniform Quantization)

Two types of nonlinear compressing functions• Mu-law adopted by North American telecommunications

systems• A-law adopted by European telecommunications systems

Mu-law(A-law) compresses the signal to 8 bits/sample or 64Kbits/second (without compandor, we would need 12bits/sample)

-law

3. PCM & DPCM

where A is the peak-input magnitude and is a constant that controls the degree of compression.


-law

Examples

3. PCM & DPCM


A-law

3. PCM & DPCM


with Ao a constant that controls the degree of compression.

A-law

Examples

3. PCM & DPCM


3. PCM & DPCM

Differential Pulse Code Modulation

Since speech signals are slowly varying, it is possible to eliminate the temporal redundancy by prediction

Quantizing the prediction-error Signal

i[n] are entered into the quantizer’s decoder to obtain the quantized prediction error, which is combined with the prediction xp[n] to form the quantized input.

DPCM encoder (top) and decoder (bottom)

3. PCM & DPCM

Differential Pulse Code Modulation

PCM quantized Signal (left) and Quantization error (right)

DPCM quantized Signal (left) and Quantization error (right)

Comparison between PCM and DPCM

Half the bit rate was used in DPCM and a higher SNR was achieved




3. PCM & DPCM


5. Conclusion

6. References


Linear prediction coding relies on a highly simplified model for speech production

The LPC model of speech production

Parameters of the model are estimated from the speech samples


The LPC model of speech production

Parameters of the model are estimated from the speech samplesThese include:

Voicing: whether the frame is voiced or unvoiced.Gain: mainly related to the energy level of the frame.Filter coefficients: specify the response of the synthesis filter.Pitch period: in the case of voiced frames, time length between consecutive excitation impulses.


By carefully allocating bits for each parameter so as to minimize distortion, an impressive compression ratio can be achieved.

For instance, the bit-rate of 2.4kbps for the FS1015 coder is 53.3 times lower than the corresponding bit-rate for 16-bit PCM

Estimating the parameters is the responsibility of the encoder.

The decoder takes the estimated parameters and uses the speech production model to synthesize speech


Block diagram of the LPC encoder.


Block diagram of the LPC decoder.

4. Linear Predictive Coding The Voicing Detector is a key element to successful coding. The purpose of the voicing detector is to classify a given frame as voiced or unvoiced.Measurements that a voicing detector relies on toaccomplish its task :

Energyor

Zero Crossing Rate

Prediction Gain


Top left: A speech waveform. Top right: Magnitude sum function. Bottom left: Zero crossing rate. Bottom right: Prediction gain.


Bandwidth: 2.4kbpsSamples/frame : 180 samplesFrame Size: 22.5ms = 44.44 frames/sec


Speech Coder Standard

FS1015-LPC10Coefficient 10

FS1016-CELPCode Excitation

MELPMixed Excitation

IS-54 VCELPVector Sum Excited

IS-96 QCELPQualComm Code Excited

LD-CELP G.728Low-Delay Code-Excited

G.729 CS-ACELPConjugate-structure Algebraic-Code-Excited

5. Conclusion

An overview of speech coding was introduced with a brief explanation of the speech production model. Properties of different coding techniques were also co0mpared. For wire line transmission coding, PCM and DPCM were covered. Linear Prediction Coding which is a basic for modern wireless systems was also introduced.

6. References

Speech Coding Algorithms “Wai C. Chu”

Digital Communications “Bernard Skalr”

Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.

Documents

Transcript of Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.