Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP...

38
Speech Coding Techniques

Transcript of Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP...

Page 1: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

Speech Coding Techniques

Page 2: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

Introduction Efficient speech-coding techniques

Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth, the lower

the quality RTP payload types Processing power

The better quality (for a given bandwidth) uses a more complex algorithm

A balance between quality and cost

Page 3: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

Voice Quality Bandwidth is easily quantified

Voice quality is subjective MOS, Mean Opinion Score

ITU-T Recommendation P.800 Excellent – 5 Good – 4 Fair – 3 Poor – 2 Bad – 1

A minimum of 30 people Listen to voice samples or in conversations

Page 4: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

P.800 recommendations The selection of participants The test environment Explanations to listeners Analysis of results

Toll quality A MOS of 4.0 or higher

Page 5: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

About Speech Speech

Air pushed from the lungs past the vocal cords and along the vocal tract

The basic vibrations – vocal cords The sound is altered by the

disposition of the vocal tract ( tongue and mouth)

Model the vocal tract as a filter The shape changes relatively slowly

The vibrations at the vocal cords The excitation signal

Page 6: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

Voiced sound The vocal cords vibrate open and close Quasi-periodic pulses of air The rate of the opening and closing – the pitch

Unvoiced sounds Forcing air at high velocities through a constriction Noise-like turbulence Show little long-term periodicity Short-term correlations still present

Plosive sounds A complete closure in the vocal tract Air pressure is built up and released suddenly

Speech sounds

Page 7: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

Voice Sampling Discrete Time LTI Systems: The

Convolution Sum

k

knkxnx ][][][

0 1 2

0 1 0 1 2 3

h[n]

x[n] y[n]

n

n n

1

0.5

2

0.5

2.5 2

k

knhkxny ][][][

Page 8: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

n

nTtts )()(

nc

cs

nTttx

tstxtx

)()(

)()()(

k

skT

jS )(2

)(

)( jX c

NN

SS 0

)( jX c

NN SS

)( NS

Nyquist sampling theorem

Page 9: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

Quantization (Scalar Quantization)

v1 v2 vk+1 vL

m0= -A m1 m2 …… mk mk+1 mL1 mL=A

·      Assume | x[n] | Adivide the range [ A , A ] into L quantization levels{ J1 , J2 , …… Jk ,….. JL }

Jk : [mk-1,mk ]

L = 2R

each quantization level Jk is represented by a value vk

S = U Jk , V = { v1 , v2 , …… vk ,….. vL }

Jk+1

Page 10: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

Non-Uniform Quantization

m0 = -A m1 m2 …… 0 mL=A

Concept : small quantization levels for small x

large quantization levels for large x Goal: constant SNRQ for all x

Page 11: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

Companding

F(x)x[n]

Uniform Quantization

F1(x)x[n]

Uniform Decoder

^

Compressor …1101…1101… Expandor

Compressor + Expandor Compandor

F(x) is to specify the non-uniform quantization characteristics

Page 12: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

Non-Uniform Quantization -law

A-law

11

,1

][1

10,

1)(

xAAln

xAlnA

xAln

xA

xF

Typical values in practice

= 255 , A = 87.6

10,

1

1)(

xμ)(log

xμlogxF

Page 13: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

Types of Speech Codecs Waveform codecs,source codecs

(also known as vocoders),and hybrid codecs.

Page 14: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

Speech Source Model and Source Coding

unvoiced

G v/u

voiced

N

randomsequencegenerator

periodic pulse traingenerator

G(z) = 1

1 akz-k P

k = 1

x[n]

G(z), G(), g[n]

u[n]

Excitation

Vocal Tract Model

Excitation parameters

v/u : voiced/ unvoiced

N : pitch for voiced

G : signal gain

excitation signal u[n]

Vocal Tract parameters

{ak} : LPC coefficients

formant structure of speech signals

A good approximation, though not precise enough

Page 15: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

LPC Vocoder(Voice Coder)x[n]

LPC Analysis

{ ak }N , Gv/u

Encoder…11011…

N by pitch detection

v/u by voicing detection

Decoder { ak }N , Gv/u

receiver

…11011…

g[n]G(z)

Exx[n]

{ak} can be non-uniform or vector quantized to reduce bit rate further

Page 16: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

The most commonplace codec Used in circuit-switched telephone

network PCM, Pulse-Code Modulation

If uniform quantization 12 bits * 8 k/sec = 96 kbps

Non-uniform quantization 65 kbps DS0 rate

North America A-law

Other countries, a little friendlier to lower signal levels

An MOS of about 4.3

law

G.711

Page 17: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

ADPCM(adaptive differential PCM) DPCM and ADPCM.

ADPCM : Adaptive Prediction in DPCM Adaptive Quantization

Adaptive Quantization Quantization level varies with local signal level [n] = ax[n] x[n] : locally estimated standard deviation of x[n]

G.721:ADPCM-coded speech at 32Kbps. G.726(A-law or )

16,24,32,40Kbps MOS 4.0 , at 32Kbps

law

Page 18: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

Analysis-by-Synthesis (AbS) Codecs

Hybrid codec Fill the gap between waveform and source

codecs The most successful and commonly used

Time-domain AbS codecs Not a simple two-state, voiced/unvoiced Different excitation signals are attempted Closest to the original waveform is

selected MPE, Multi-Pulse Excited RPE, Regular-Pulse Excited CELP, Code-Excited Linear Predictive

Page 19: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

G.728 LD-CELP CELP codecs

A filter; its characteristics change over time A codebook of acoustic vectors

A vector = a set of elements representing various char. of the excitation

Transmit Filter coefficients, gain, a pointer to the vector

chosen Low Delay CELP

Backward-adaptive coder Use previous samples to determine filter

coefficients Operates on five samples at a time

Delay < 1 ms Only the pointer is transmitted

Page 20: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

1024 vectors in the code book 10-bit pointer (index) 16 kbps

LD-CELP encoder Minimize a frequency-weighted mean-square error

Page 21: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

LD-CELP decoder

An MOS score of about 3.9 One-quarter of G.711 bandwidth

Page 22: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

G.723.1 ACELP 6.3 or 5.3 kbps

Both mandatory Can change from one to another during a conversation

The coder A band-limited input speech signal Sampled at 8 KHz, 16-bit uniform PCM quantization Operate on blocks of 240 samples at a time A look-ahead of 7.5 ms A total algorithmic delay of 37.5 ms + other delays A high-pass filter to remove any DC component

Page 23: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

G.723.1 Annex A Silence Insertion Description (SID)

frames of size four octets The two lsbs of the first octet

00 6.3kbps 24 octets/frame 01 5.3kbps 20 10 SID frame 4

An MOS of about 3.8 At least 37.5 ms delay

Page 24: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

G.729 8 kbps Input frames of 10 ms, 80 samples for 8

KHz sampling rate 5 ms look-ahead

Algorithmic delay of 15 ms An 80-bit frame for 10 ms of speech A complex codec

G.729.A (Annex A), a number of simplifications

Same frame structure Encoder/decoder, G.729/G.729.A Slightly lower quality

Page 25: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

G.729.B VAD, Voice Activity Detection

Based on analysis of several parameters of the input

The current frames plus two preceding frames DTX, Discontinuous Transmission

Send nothing or send an SID frame SID frame contains information to generate

comfort noise CNG, Comfort Noise Generation

G.729, an MOS of about 4.0 G.729A an MOS of about 3.7

Page 26: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

Other Codecs

CDMA QCELP defined in IS-733 Variable-rate coder Two most common rates

The high rate, 13.3 kbps A lower rate, 6.2 kbps

Silence suppression For use with RTP, RFC 2658

Page 27: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

GSM Enhanced Full-Rate (EFR) GSM 06.60 An enhanced version of GSM Full-Rate ACELP-based codec The same bit rate and the same

overall packing structure 12.2 kbps

Support discontinuous transmission For use with RTP, RFC 1890

Page 28: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

GSM Adaptive Multi-Rate (AMR) codec GSM 06.90 Eight different modes 4.75 kbps to 12.2 kbps 12.2 kbps, GSM EFR 7.4 kbps, IS-641 (TDMA cellular

systems) Change the mode at any time Offer discontinuous transmission The coding choice of many 3G

wireless networks

Page 29: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

The MOS values are for laboratory conditions G.711 does not deal with lost packets G.729 can accommodate a lost frame

by interpolating from previous frames But cause errors in subsequent speech

frames

Processing Power G.728 or G.729, 40 MIPS G.726 10 MIPS

Page 30: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

Cascaded Codecs E.g., G.711 stream -> G.729

encoder/decoder Might not even come close to G.729

Each coder only generate an approximate of the incoming signal

Page 31: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

Tones, Signal, and DTMF Digits The hybrid codecs are optimized for

human speech Other data may need to be transmitted Tones: fax tones, dialing tone, busy tone DTMF digits for two-stage dialing or voice-

mail G.711 is OK G.723.1 and G.729 can be unintelligible The ingress gateway needs to intercept

The tones and DTMT digits Use an external signaling system

Page 32: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

Easy at the start of a call Difficult in the middle of a call

Encode the tones differently form the speech

Send them along the same media path An RTP packet provides the name of the tone and

the duration Or, a dynamic RTP profile; an RTP packet

containing the frequency, volume and the duration

RFC 2198 An RTP payload format for redundant audio

data Sending both types of RTP payload

Page 33: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

RTP Payload Format for DTMF Digits An Internet Draft Both methods described before A large number of tones and events

DTMF digits, a busy tone, a congestion tone, a ringing tone, etc.

The named events E: the end of the tone, R: reserved

Page 34: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

Payload format

Page 35: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

Finis

Page 36: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

Discrete Time LTI Systems: The Convolution Sum

k

knkxnx ][][][

0 1 2

0 1 0 1 2 3

h[n]

x[n] y[n]

n

n n

1

0.5

2

0.5

2.5 2

k

knhkxny ][][][

Page 37: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

Frequency-Domain Representation of Sampling

n

nTtts )()(

nc

cs

nTttx

tstxtx

)()(

)()()(

k

skT

jS )(2

)(

)( jX c

NN

SS 0

)( jX c

NN SS

)( NS

Page 38: Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

Speech Source Model and Source Coding Vocal Tract Model

p

kk nxknxanu

1

][][)(

)(

)(

1

1)(

1

zU

zX

zazG p

k

kk