Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding...

42
Speech-Coding Techniques Chapter 3

Transcript of Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding...

Page 1: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

Speech-Coding Techniques

Chapter 3

Page 2: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-2Internet Telephony

Introduction

Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth, the lower the

quality RTP payload types Processing power

The better quality (for a given bandwidth) uses a more complex algorithm

A balance between quality and cost

Page 3: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-3Internet Telephony

Voice Quality

Bandwidth is easily quantified Voice quality is subjective

MOS, Mean Opinion Score ITU-T Recommendation P.800

Excellent – 5 Good – 4 Fair – 3 Poor – 2 Bad – 1

A minimum of 30 people Listen to voice samples or in conversations

Page 4: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-4Internet Telephony

P.800 recommendations The selection of participants The test environment Explanations to listeners Analysis of results

Toll quality A MOS of 4.0 or higher

Page 5: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-5Internet Telephony

Subjective and objective quality-testing techniques

PSQM – Perceptual Speech Quality Measurement ITU-T P.861 faithfully represent human judgement and

perception algorithmic comparison between the output

signal and a know input type of speaker, loudness, delay,

active/silence frames, clipping, environmental noise

Page 6: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-6Internet Telephony

A Little About Speech

Speech Air pushed from the lungs past the vocal

cords and along the vocal tract The basic vibrations – vocal cords The sound is altered by the disposition of

the vocal tract ( tongue and mouth) Model the vocal tract as a filter

The shape changes relatively slowly The vibrations at the vocal cords

The excitation signal

Page 7: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-7Internet Telephony

Speech sounds

Voiced sound The vocal cords vibrate open and close Interrupt the air flow Quasi-periodic pluses of air The rate of the opening and closing – the

pitch A high degree of periodicity at the pitch

period 2-20 ms

Page 8: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-8Internet Telephony

Voiced speech Power spectrum density

Page 9: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-9Internet Telephony

Unvoiced sounds Forcing air at high velocities through a

constriction The glottis is held open Noise-like turbulence Show little long-term periodicity Short-term correlations still present

Page 10: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-10Internet Telephony

unvoiced speech Power spectrum density

Page 11: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-11Internet Telephony

Plosive sounds A complete closure in the vocal tract Air pressure is built up and released

suddenly A vast array of sounds

The speech signal is relatively predictable over time

The reduction of transmission bandwidth can be significant

Page 12: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-12Internet Telephony

Voice Sampling

A-to-D discrete samples of the waveform and

represent each sample by some number of bits

A signal can be reconstructed if it is sampled at a minimum of twice the maximum freq.

Human speech 300-3800 Hz 8000 samples per second

time

Each sample is encoded into an 8-bit PCM code word

(e.g. 01100101)

=> 8000 x 8 bit/s

Page 13: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-13Internet Telephony

Quantization

How many bits is used to represent Quantization noise

The difference between the actual level of the input analog signal

More bits to reduce Diminishing returns

Uniform quantization levels Louder talkers sound better 11.2/11 v.s. 2.2/2

Page 14: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-14Internet Telephony

Non-uniform quantization Smaller quantization steps at smaller signal

levels Spread signal-to-noise ratio more evenly

Page 15: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-15Internet Telephony

DTX and Comfort Noise

DTX is Discontinuous Transmission Voice activity detector (VAD) detects if

there is active speech or not. When there is no active speech different

DTX procedures can be used: No Transmission at all Comfort Noise (CN) using RFC 3389 Codec built CN in like AMR SID (Silence

Descriptor) Frequency of Comfort Noise packets

varies but is usually some fraction of normal packet rate

Page 16: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-16Internet Telephony

Type of Speech Coders

Waveform codecs Sample and code High-quality and not complex Large amount of bandwidth

source codecs (vocoders) Match the incoming signal to a math model Linear-predictive filter model of the vocal tract A voiced/unvoiced flag for the excitation The information is sent rather than the signal Low bit rates, but sounds synthetic Higher bit rates do not improve much

Page 17: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-17Internet Telephony

Hybrid codecs Attempt to provide the best of both Perform a degree of waveform matching Utilize the sound production model Quite good quality at low bit rate

Page 18: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-18Internet Telephony

G.711

The most commonplace codec Used in circuit-switched telephone network PCM, Pulse-Code Modulation

If uniform quantization 12 bits * 8 k/sec = 96 kbps

Non-uniform quantization 64 kbps DS0 rate mu-law

North America A-law

Other countries, a little friendlier to lower signal levels An MOS of about 4.3

Page 19: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-19Internet Telephony

DPCM

DPCM, Differential PCM Only transmit the difference between the predicated

value and the actual value Voice changes relatively slowly It is possible to predict the value of a sample base on

the values of previous samples The receiver perform the same prediction The simplest form

No prediction

No algorithmic delay

Page 20: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-20Internet Telephony

ADPCM

ADPCM, Adaptive DPCM Predicts sample values based on

Past samples Factoring in some knowledge of how speech varies

over time The error is quantized and transmitted

Fewer bits required G.721

32 kbps G.726

A-law/mu-law PCM -> 16, 24, 32, 40 kbps An MOS of about 4.0 at 32 kbps

Page 21: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-21Internet Telephony

Analysis-by-Synthesis (AbS) Codecs

Hybrid codec Fill the gap between waveform and source

codecs The most successful and commonly used

Time-domain AbS codecs Not a simple two-state, voiced/unvoiced Different excitation signals are attempted Closest to the original waveform is selected MPE, Multi-Pulse Excited RPE, Regular-Pulse Excited CELP, Code-Excited Linear Predictive

Page 22: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-22Internet Telephony

G.728 LD-CELP

CELP codecs A filter; its characteristics change over time A codebook of acoustic vectors

A vector = a set of elements representing various char. of the excitation

Transmit Filter coefficients, gain, a pointer to the vector

chosen Low Delay CELP

Backward-adaptive coder Use previous samples to determine filter coefficients Operates on five samples at a time

Delay < 1 ms Only the pointer is transmitted

Page 23: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-23Internet Telephony

1024 vectors in the code book 10-bit pointer (index) 16 kbps

LD-CELP encoder Minimize a frequency-weighted mean-

square error

Page 24: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-24Internet Telephony

LD-CELP decoder

An MOS score of about 3.9 One-quarter of G.711 bandwidth

Page 25: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-25Internet Telephony

G.723.1 ACELP

6.3 or 5.3 kbps Both mandatory Can change from one to another during a

conversation The coder

A band-limited input speech signal Sampled at 8 KHz, 16-bit uniform PCM

quantization Operate on blocks of 240 samples at a time A look-ahead of 7.5 ms A total algorithmic delay of 37.5 ms + other

delays A high-pass filter to remove any DC component

Page 26: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-26Internet Telephony

Various operations to determine the appropriate filter coefficients

5.3 kbps, Algebraic Code-Excited Linear Prediction

6.3 kbps, Multi-pulse Maximum Likelihood Quantization

The transmission Linear predication coefficients Gain parameters Excitation codebook index 24-octet frames at 6.3 kbps, 20-octet frames at

5.3 kbps

Page 27: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-27Internet Telephony

G.723.1 Annex A Silence Insertion Description (SID) frames of

size four octets The two lsbs of the first octet

00 6.3kbps 24 octets/frame 01 5.3kbps 20 10 SID frame 4

An MOS of about 3.8 At least 27.5 ms delay

Page 28: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-28Internet Telephony

G.729

8 kbps Input frames of 10 ms, 80 samples for 8

KHz sampling rate 5 ms look-ahead

Algorithmic delay of 15 ms An 80-bit frame for 10 ms of speech A complex codec

G.729.A (Annex A), a number of simplifications

Same frame structure Encoder/decoder, G.729/G.729.A Slightly lower quality

Page 29: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-29Internet Telephony

G.729.B VAD, Voice Activity Detection

Based on analysis of several parameters of the input

The current frames plus two preceding frames DTX, Discontinuous Transmission

Send nothing or send an SID frame SID frame contains information to generate

comfort noise CNG, Comfort Noise Generation

G.729, an MOS of about 4.0 G.729A an MOS of about 3.7

Page 30: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-30Internet Telephony

G.729 Annex D a lower-rate extension 6.4 kbps; 10 ms speech samples, 64

bits/frame MOS 6.3 kbps G.723.1

G.729 Annex E a higher bit rate enhancement the linear prediction filter of G.729 has 10

coef. that of G.729 Annex E has 30 coef. the codebook of G.729 has 35 bits that of G.729 Annex E has 44 bits 118 bits/frame; 11.8 kbps

Page 31: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-31Internet Telephony

Other Codecs

CDMA QCELP defined in IS-733 Variable-rate coder Two most common rates

The high rate, 13.3 kbps A lower rate, 6.2 kbps

Silence suppression For use with RTP, RFC 2658

Page 32: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-32Internet Telephony

GSM Enhanced Full-Rate (EFR) GSM 06.60 An enhanced version of GSM Full-Rate ACELP-based codec The same bit rate and the same overall

packing structure 12.2 kbps

Support discontinuous transmission For use with RTP, RFC 1890

Page 33: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-33Internet Telephony

GSM Adaptive Multi-Rate (AMR) codec 20 ms coding delay Eight different modes 4.75 kbps to 12.2 kbps 12.2 kbps, GSM EFR 7.4 kbps, IS-641 (TDMA cellular systems) Change the mode at any time Offer discontinuous transmission

The SID (Silence Descriptor) is sent in every 8th frame and is 5 bytes in size

The coding choice of many 3G wireless networks

Page 34: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-34Internet Telephony

The MOS values are for laboratory conditions G.711 does not deal with lost packets G.729 can accommodate a lost frame by

interpolating from previous frames But cause errors in subsequent speech frames

Processing Power G.728 or G.729, 40 MIPS G.726 10 MIPS

Page 35: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-35Internet Telephony

iLBC

a FREE codec for robust VoIP 13.33 kbit/s with an encoding frame

length of 30 ms and 15.20 kbps of 20 ms Computational complexity in a range of

G.729A

Page 36: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-36Internet Telephony

Speex

Open-source patent-free speech codec CELP (code-excited linear prediction) codec operating modes:

narrowband (8 kHz sampling rate) 2.15 – 24.6 kb/s delay of 30 ms

wideband (16 kHz sampling rate) 4-44.2 kb/s delay of 34 ms

ultra-wideband (32 kHz sampling rate) intensity stereo encoding variable bit rate (VBR) possible voice activity detection (VAD)

Page 37: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-37Internet Telephony

Cascaded Codecs E.g., G.711 stream -> G.729

encoder/decoder Might not even come close to G.729

Each coder only generate an approximate of the incoming signal

Audio samples http://

www.cs.columbia.edu/~hgs/audio/codecs.html

Page 38: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-38Internet Telephony

Effects of packetization

Page 39: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-39Internet Telephony

Tones, Signal, and DTMF Digits

The hybrid codecs are optimized for human speech Other data may need to be transmitted Tones: fax tones, dialing tone, busy tone DTMF digits for two-stage dialing or voice-

mail G.711 is OK G.723.1 and G.729 can be unintelligible The ingress gateway needs to intercept

The tones and DTMF digits Use an external signaling system

Page 40: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-40Internet Telephony

Easy at the start of a call Difficult in the middle of a call

Encode the tones differently from the speech

Send them along the same media path An RTP packet provides the name of the tone and

the duration Or, a dynamic RTP profile; an RTP packet

containing the frequency, volume and the duration

RFC 2198 An RTP payload format for redundant audio data Sending both types of RTP payload

Page 41: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-41Internet Telephony

RTP Payload Format for DTMF Digits An Internet Draft Both methods described before A large number of tones and events

DTMF digits, a busy tone, a congestion tone, a ringing tone, etc.

The named events E: the end of the tone, R: reserved

Page 42: Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

3-42Internet Telephony

Payload format