8 Speech Coder17!9!13
-
Upload
farman-rizvi -
Category
Documents
-
view
221 -
download
0
Transcript of 8 Speech Coder17!9!13
Speech Coders
1 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
INTRODUCTION
• It is Source coding
• It determines – Performance of recovered speech
– Capacity of system
• Requirement – Low bit rate output with toll quality speech
• 2G speech coders – Half rate coders support twice the number of
users with toll quality in single channel
2 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
Goal of Speech Coders
• To transmit speech with highest possible quality using the least possible channel capacity.
• To maintain certain required levels of complexity of implementation and communication delay
3 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
CLASSIFICATION
4 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
Characteristics of Speech Signal
• Band limited signal
• Non uniform pdf of speech amplitude
• Non zero autocorrelation between successive speech samples
• Non flat nature of speech spectra
• Existence of voiced and unvoiced segments
• Quasi periodicity of voiced segment
5 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
Speech spectrum
10/4/2013 Prof. Hardip K. Shah, EC Dept., DDU 6
Speech production
• Generation: Air is forced from the lungs
• Vocal cords – vibrates
• Nasal cavity and vocal tracts Vibrates
• Classification of speech signals:
1) Voiced signal
“m”, “n”, “v” etc. are a result of quasi periodic
vibrations of the vocal chord
2) Unvoiced signal
“f” “s” “sh” etc. are fricatives produced by
turbulent air flow through constriction.
7 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
Speech parameters
-Voice pitch,
-The pole frequencies of the modulating filter and
-corresponding amplitudes
• Pitch: For certain voiced sound, vocal cords vibrate (open and close). The rate at which the vocal cords vibrate determines the pitch of voice.
• The pole frequencies correspond to the resonant frequencies of vocal tract and is known as formants
10/4/2013 Prof. Hardip K. Shah, EC Dept., DDU 8
9
The Speech Signal
Pitch Period Background Signal
Unvoiced Signal (noise-like sound)
Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
Quantization Techniques
• Process of mapping a continuous range of amplitudes of a signal into a finite set of discrete amplitudes
• Irreversible process
• Introduce distortion
• Performance Measure
MSE=E[(x-fQ(x))2]
SQNR=6.02n+α
10 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
Types of quantization
• Uniform Quantization
• Non Uniform Quatization – Distribute quatization levels in accordance with
pdf of input waveforms
– In telephony logarithmic quantizers used
• Adaptive quantization – Speech-Non stationary signal
– Long term and short term pdf are different
– Dynamic range of 40dB
11 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
Adaptive quantizers
12 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
Vector Quantization
• Shanon’s Rate distortion theorem – There exist a mapping from a source w/f to o/p code
words such that for a given distortion D, R(D) bits per sample are sufficient to reconstruct the w/f.
• R(D)- Rate distortion function
• Better performance by coding many samples at a time
• R=log2n/L – n-size of VQ code book, L-number of sample in a block.
13 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
Frequency domain Coders
• Speech is divided in several frequency components
• Various Frequency components are quantized and coded separately
– Sub Band coders (SBC)
– Block Transform coding
14 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
SBC • Controlling and distributing quantization noise
across the signal spectrum
• Human ear does not detect quantization error at all frequencies equally well.
• Speech is divided in four our eight bands Sub band number
Frequency Range Assigned bits
1 200-700 4
2 700-1310 3
3 1310-2020 2
4 2020-3200 2
15 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
SBC
16 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
Vocoders
17 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
LPC
• TIME DOMAIN VOCODER
• Computationally intensive but more popular
• Good quality voice at 4.8 Kbps and poor quality voice even at lower rate
• Models vocal tract filter as all pole filter
• Transmits gain factor, voice/unvoiced decision, pitch freq., predictor coefficient
10/4/2013 Prof. Hardip K. Shah, EC Dept., DDU 18
LPC
19 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
LPC Variants
• Multi pulse excited LPC (MPE-LPC)
– Excitation by a single pulse per pitch period produces audible distortion
– Eight pulse per period
10/4/2013 Prof. Hardip K. Shah, EC Dept., DDU 20
21
Multi-Pulse Coder
Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
22 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
23
Code Excited LP • prediction on each frame by codewords from a VQ-
generated codebook • 40 sample code words for a 5 msec frame at 8 kHz
sampling rate • can use either “deterministic” or “stochastic”
codebook—10 bit codebooks are common • stochastic codebooks motivated by observation that the
histogram of the residual from the long-term predictor roughly is Gaussian pdf => construct codebook from white Gaussian random numbers with unit variance
• CDMA cellular standard(IS-95) uses variable rate CELP. 1.2 to 14.4 Kbps
Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
24
VQ Codebook of LPC Vectors
64 vectors in a
codebook of spectral shapes
Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
25 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
RELP
26 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
Choosing Speech Codecs – Perceived quality of speech – Output bit rate – Cost – Capacity – End to end encoding delay – Algorithmic complexity – Dc power requirement – Compatibility with existing standard – Robustness of the encoded speech to transmission
errors – Cell size – Multiple Access techniques – Modulation technique
10/4/2013 Prof. Hardip K. Shah, EC Dept., DDU 27
10/4/2013 Prof. Hardip K. Shah, EC Dept., DDU 28
29 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
GSM CODEC
• Regular pulse excited long term prediction (RPE LTP)
• Combines advantage of RELP and MPE-LTP
• Relatively complex and power hungry
• Encoder in four processing blocks
10/4/2013 Prof. Hardip K. Shah, EC Dept., DDU 30
GSM CODEC
31 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
GSM CODEC
32 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
33
Measures of Speech Coder Quality
12
010 1
2
0
1
( )
10log , over whole signal
ˆ( ) ( )
1 over frames of 10-20 msec
good primarily for waveform coders
N
n
N
n
K
seg k
k
s n
SNR
s n s n
SNR SNRK
Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
34
Measures of Speech Coder Quality • Intelligibility-Diagnostic Rhyme Test (DRT)
– compare words that differ in leading consonant – identify spoken word as one of a pair of choices – high scores (~90%) obtained for all coders above 4
Kbps • Subjective Quality-Mean Opinion Score (MOS)
– 5 excellent quality – 4 good quality – 3 fair quality – 2 poor quality – 1 bad quality
• MOS scores for high quality wideband speech (~4.5) and for high quality telephone bandwidth speech (~4.1)
Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
Mean Opinion Score(MOS)
35 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013
36 Prof. Hardip K. Shah, EC Dept., DDU 10/4/2013