8 Speech Coder17!9!13

36
Speech Coders 1 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Transcript of 8 Speech Coder17!9!13

Page 1: 8 Speech Coder17!9!13

Speech Coders

1 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 2: 8 Speech Coder17!9!13

INTRODUCTION

• It is Source coding

• It determines – Performance of recovered speech

– Capacity of system

• Requirement – Low bit rate output with toll quality speech

• 2G speech coders – Half rate coders support twice the number of

users with toll quality in single channel

2 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 3: 8 Speech Coder17!9!13

Goal of Speech Coders

• To transmit speech with highest possible quality using the least possible channel capacity.

• To maintain certain required levels of complexity of implementation and communication delay

3 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 4: 8 Speech Coder17!9!13

CLASSIFICATION

4 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 5: 8 Speech Coder17!9!13

Characteristics of Speech Signal

• Band limited signal

• Non uniform pdf of speech amplitude

• Non zero autocorrelation between successive speech samples

• Non flat nature of speech spectra

• Existence of voiced and unvoiced segments

• Quasi periodicity of voiced segment

5 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 6: 8 Speech Coder17!9!13

Speech spectrum

10/4/2013 Prof. Hardip K. Shah, EC Dept., DDU 6

Page 7: 8 Speech Coder17!9!13

Speech production

• Generation: Air is forced from the lungs

• Vocal cords – vibrates

• Nasal cavity and vocal tracts Vibrates

• Classification of speech signals:

1) Voiced signal

“m”, “n”, “v” etc. are a result of quasi periodic

vibrations of the vocal chord

2) Unvoiced signal

“f” “s” “sh” etc. are fricatives produced by

turbulent air flow through constriction.

7 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 8: 8 Speech Coder17!9!13

Speech parameters

-Voice pitch,

-The pole frequencies of the modulating filter and

-corresponding amplitudes

• Pitch: For certain voiced sound, vocal cords vibrate (open and close). The rate at which the vocal cords vibrate determines the pitch of voice.

• The pole frequencies correspond to the resonant frequencies of vocal tract and is known as formants

10/4/2013 Prof. Hardip K. Shah, EC Dept., DDU 8

Page 9: 8 Speech Coder17!9!13

9

The Speech Signal

Pitch Period Background Signal

Unvoiced Signal (noise-like sound)

Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 10: 8 Speech Coder17!9!13

Quantization Techniques

• Process of mapping a continuous range of amplitudes of a signal into a finite set of discrete amplitudes

• Irreversible process

• Introduce distortion

• Performance Measure

MSE=E[(x-fQ(x))2]

SQNR=6.02n+α

10 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 11: 8 Speech Coder17!9!13

Types of quantization

• Uniform Quantization

• Non Uniform Quatization – Distribute quatization levels in accordance with

pdf of input waveforms

– In telephony logarithmic quantizers used

• Adaptive quantization – Speech-Non stationary signal

– Long term and short term pdf are different

– Dynamic range of 40dB

11 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 12: 8 Speech Coder17!9!13

Adaptive quantizers

12 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 13: 8 Speech Coder17!9!13

Vector Quantization

• Shanon’s Rate distortion theorem – There exist a mapping from a source w/f to o/p code

words such that for a given distortion D, R(D) bits per sample are sufficient to reconstruct the w/f.

• R(D)- Rate distortion function

• Better performance by coding many samples at a time

• R=log2n/L – n-size of VQ code book, L-number of sample in a block.

13 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 14: 8 Speech Coder17!9!13

Frequency domain Coders

• Speech is divided in several frequency components

• Various Frequency components are quantized and coded separately

– Sub Band coders (SBC)

– Block Transform coding

14 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 15: 8 Speech Coder17!9!13

SBC • Controlling and distributing quantization noise

across the signal spectrum

• Human ear does not detect quantization error at all frequencies equally well.

• Speech is divided in four our eight bands Sub band number

Frequency Range Assigned bits

1 200-700 4

2 700-1310 3

3 1310-2020 2

4 2020-3200 2

15 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 16: 8 Speech Coder17!9!13

SBC

16 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 17: 8 Speech Coder17!9!13

Vocoders

17 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 18: 8 Speech Coder17!9!13

LPC

• TIME DOMAIN VOCODER

• Computationally intensive but more popular

• Good quality voice at 4.8 Kbps and poor quality voice even at lower rate

• Models vocal tract filter as all pole filter

• Transmits gain factor, voice/unvoiced decision, pitch freq., predictor coefficient

10/4/2013 Prof. Hardip K. Shah, EC Dept., DDU 18

Page 19: 8 Speech Coder17!9!13

LPC

19 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 20: 8 Speech Coder17!9!13

LPC Variants

• Multi pulse excited LPC (MPE-LPC)

– Excitation by a single pulse per pitch period produces audible distortion

– Eight pulse per period

10/4/2013 Prof. Hardip K. Shah, EC Dept., DDU 20

Page 21: 8 Speech Coder17!9!13

21

Multi-Pulse Coder

Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 22: 8 Speech Coder17!9!13

22 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 23: 8 Speech Coder17!9!13

23

Code Excited LP • prediction on each frame by codewords from a VQ-

generated codebook • 40 sample code words for a 5 msec frame at 8 kHz

sampling rate • can use either “deterministic” or “stochastic”

codebook—10 bit codebooks are common • stochastic codebooks motivated by observation that the

histogram of the residual from the long-term predictor roughly is Gaussian pdf => construct codebook from white Gaussian random numbers with unit variance

• CDMA cellular standard(IS-95) uses variable rate CELP. 1.2 to 14.4 Kbps

Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 24: 8 Speech Coder17!9!13

24

VQ Codebook of LPC Vectors

64 vectors in a

codebook of spectral shapes

Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 25: 8 Speech Coder17!9!13

25 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 26: 8 Speech Coder17!9!13

RELP

26 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 27: 8 Speech Coder17!9!13

Choosing Speech Codecs – Perceived quality of speech – Output bit rate – Cost – Capacity – End to end encoding delay – Algorithmic complexity – Dc power requirement – Compatibility with existing standard – Robustness of the encoded speech to transmission

errors – Cell size – Multiple Access techniques – Modulation technique

10/4/2013 Prof. Hardip K. Shah, EC Dept., DDU 27

Page 28: 8 Speech Coder17!9!13

10/4/2013 Prof. Hardip K. Shah, EC Dept., DDU 28

Page 29: 8 Speech Coder17!9!13

29 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 30: 8 Speech Coder17!9!13

GSM CODEC

• Regular pulse excited long term prediction (RPE LTP)

• Combines advantage of RELP and MPE-LTP

• Relatively complex and power hungry

• Encoder in four processing blocks

10/4/2013 Prof. Hardip K. Shah, EC Dept., DDU 30

Page 31: 8 Speech Coder17!9!13

GSM CODEC

31 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 32: 8 Speech Coder17!9!13

GSM CODEC

32 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 33: 8 Speech Coder17!9!13

33

Measures of Speech Coder Quality

12

010 1

2

0

1

( )

10log , over whole signal

ˆ( ) ( )

1 over frames of 10-20 msec

good primarily for waveform coders

N

n

N

n

K

seg k

k

s n

SNR

s n s n

SNR SNRK

Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 34: 8 Speech Coder17!9!13

34

Measures of Speech Coder Quality • Intelligibility-Diagnostic Rhyme Test (DRT)

– compare words that differ in leading consonant – identify spoken word as one of a pair of choices – high scores (~90%) obtained for all coders above 4

Kbps • Subjective Quality-Mean Opinion Score (MOS)

– 5 excellent quality – 4 good quality – 3 fair quality – 2 poor quality – 1 bad quality

• MOS scores for high quality wideband speech (~4.5) and for high quality telephone bandwidth speech (~4.1)

Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 35: 8 Speech Coder17!9!13

Mean Opinion Score(MOS)

35 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013

Page 36: 8 Speech Coder17!9!13

36 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013