Linear Predictive Coding for Speech Compression Dev Ghosh ECE 463 9 March 2006.
-
Upload
gillian-bradley -
Category
Documents
-
view
215 -
download
3
Transcript of Linear Predictive Coding for Speech Compression Dev Ghosh ECE 463 9 March 2006.
![Page 1: Linear Predictive Coding for Speech Compression Dev Ghosh ECE 463 9 March 2006.](https://reader035.fdocuments.in/reader035/viewer/2022072006/56649cf85503460f949c88af/html5/thumbnails/1.jpg)
Linear Predictive Coding for Speech Compression
Dev GhoshECE 463
9 March 2006
![Page 2: Linear Predictive Coding for Speech Compression Dev Ghosh ECE 463 9 March 2006.](https://reader035.fdocuments.in/reader035/viewer/2022072006/56649cf85503460f949c88af/html5/thumbnails/2.jpg)
Overview
General Model for Speech Synthesis Channel Vocoder Linear Predictive Coder (LPC-10) Code Excited Linear Prediction
(CELP) Novel Application
Sub-band adaptive filtering based on cochlear model
![Page 3: Linear Predictive Coding for Speech Compression Dev Ghosh ECE 463 9 March 2006.](https://reader035.fdocuments.in/reader035/viewer/2022072006/56649cf85503460f949c88af/html5/thumbnails/3.jpg)
Model for Speech Synthesis Speech produced by forcing air through
vocal cords, larynx, pharynx, mouth and nose
At transmitter speech is divided into segments Each segment analyzed to determine excitation
signal and parameters of vocal tract filter
ExcitationSource
Vocal tractfilter
Speech
![Page 4: Linear Predictive Coding for Speech Compression Dev Ghosh ECE 463 9 March 2006.](https://reader035.fdocuments.in/reader035/viewer/2022072006/56649cf85503460f949c88af/html5/thumbnails/4.jpg)
Channel Vocoder - analysis
Each segment of input speech analyzed by a bank of (bandpass) analysis filters
Energy at output of each filter is estimated 50 times a second and transmitted to receiver
Decision made whether segment voiced /a/, /e/, /o/ or unvoiced /s/, /f/
Estimate of pitch period (period of fundamental harmonic) is determined
![Page 5: Linear Predictive Coding for Speech Compression Dev Ghosh ECE 463 9 March 2006.](https://reader035.fdocuments.in/reader035/viewer/2022072006/56649cf85503460f949c88af/html5/thumbnails/5.jpg)
Voice vs. Unvoiced Speech
![Page 6: Linear Predictive Coding for Speech Compression Dev Ghosh ECE 463 9 March 2006.](https://reader035.fdocuments.in/reader035/viewer/2022072006/56649cf85503460f949c88af/html5/thumbnails/6.jpg)
Channel vocoder - synthesis
Vocal tract filter implemented by bank of (bandpass) synthesis filters For voiced segments, periodic pulse
generator is input For unvoiced segments, pseudonoise source
is input Period determined by pitch estimate Scaled by output of energy estimate First approach to speech compression
![Page 7: Linear Predictive Coding for Speech Compression Dev Ghosh ECE 463 9 March 2006.](https://reader035.fdocuments.in/reader035/viewer/2022072006/56649cf85503460f949c88af/html5/thumbnails/7.jpg)
Linear Predictive Coder
Models vocal tract as a single linear filter
yn = ∑aiyn-i+Gn
Output: yn, Input: n, Gain: G Input is random noise (unvoiced)
or periodic pulse (voiced) LPC-10 is a standard (2.4 kb, 8000
Samples/sec)
![Page 8: Linear Predictive Coding for Speech Compression Dev Ghosh ECE 463 9 March 2006.](https://reader035.fdocuments.in/reader035/viewer/2022072006/56649cf85503460f949c88af/html5/thumbnails/8.jpg)
LPC - Voiced/Unvoiced Decision
Voiced speech has more energy and lower frequency than unvoiced
Speech segment lowpass filtered, energy at output relative to background noise used to determine
Zero-crossings counted to determine frequency
Continuity critereon: voicing decision of neighboring frames taken into account
![Page 9: Linear Predictive Coding for Speech Compression Dev Ghosh ECE 463 9 March 2006.](https://reader035.fdocuments.in/reader035/viewer/2022072006/56649cf85503460f949c88af/html5/thumbnails/9.jpg)
LPC - Estimating Pitch Period
Extracting pitch from short noisy segment is difficult
One approach is to maximize autocorrelation Periodicity isn’t strong enough Threshold can’t be used because
maximum value not known in advance
![Page 10: Linear Predictive Coding for Speech Compression Dev Ghosh ECE 463 9 March 2006.](https://reader035.fdocuments.in/reader035/viewer/2022072006/56649cf85503460f949c88af/html5/thumbnails/10.jpg)
LPC - Estimating Pitch Period LPC-10 uses average magnitude difference
function (AMDF)AMDF(P) =(1/N)∑|yi-yi-P|
If {yn} is periodic with period P0, samples P0 apart will have values close to each other and AMDF will have a min at P0
AMDF is periodic for voiced and roughly flat for unvoiced
AMDF is min when P is the pitch period and spurious min in unvoiced segments are shallow
![Page 11: Linear Predictive Coding for Speech Compression Dev Ghosh ECE 463 9 March 2006.](https://reader035.fdocuments.in/reader035/viewer/2022072006/56649cf85503460f949c88af/html5/thumbnails/11.jpg)
LPC - Obtaining Vocal Tract Filter
At transmitter, we want filter coeffs that best match the segment in a mean squared error
en2=(yn- ∑aiyn-i+Gn)2
Autocorrelation approach assumes {yn} is stationary
A = R-1P Recursive solution uses Levinson-
Durbin
![Page 12: Linear Predictive Coding for Speech Compression Dev Ghosh ECE 463 9 March 2006.](https://reader035.fdocuments.in/reader035/viewer/2022072006/56649cf85503460f949c88af/html5/thumbnails/12.jpg)
LPC - Obtaining the Vocal Tract Filter
Covariance approach discards stationarity assumption (not valid for speech signals)
cij =E[yn-iyn-j]
yieldsCA = S
![Page 13: Linear Predictive Coding for Speech Compression Dev Ghosh ECE 463 9 March 2006.](https://reader035.fdocuments.in/reader035/viewer/2022072006/56649cf85503460f949c88af/html5/thumbnails/13.jpg)
LPC - Obtaining the Vocal Tract Filter
cij are estimated as
cij = ∑yn-iyn-j
No longer assume values of yn outside of segment are zero
Cholesky decomposition required Reflection coeffs used to update
voicing decision
![Page 14: Linear Predictive Coding for Speech Compression Dev Ghosh ECE 463 9 March 2006.](https://reader035.fdocuments.in/reader035/viewer/2022072006/56649cf85503460f949c88af/html5/thumbnails/14.jpg)
LPC - Transmitting Parameters
Tenth order filter used for voiced speech and fourth order for unvoiced
Vocal tract filter is sensitive to errors in reflection coeffs close to one
gi = (1+ki)/(1-ki)
are quantized and sent instead of ki
![Page 15: Linear Predictive Coding for Speech Compression Dev Ghosh ECE 463 9 March 2006.](https://reader035.fdocuments.in/reader035/viewer/2022072006/56649cf85503460f949c88af/html5/thumbnails/15.jpg)
Code Excited Linear Prediction
Single pulse per pitch period leads to buzzy twang
Variety of excitation signals is allowed
For each segment encoder finds excitation vector that generates synthesized speech that best matches speech being coded
![Page 16: Linear Predictive Coding for Speech Compression Dev Ghosh ECE 463 9 March 2006.](https://reader035.fdocuments.in/reader035/viewer/2022072006/56649cf85503460f949c88af/html5/thumbnails/16.jpg)
Sub-band adaptive filtering
Multi-channel speech enhancement system
Greater number of sub-bands used, the faster the convergence of the overall system
![Page 17: Linear Predictive Coding for Speech Compression Dev Ghosh ECE 463 9 March 2006.](https://reader035.fdocuments.in/reader035/viewer/2022072006/56649cf85503460f949c88af/html5/thumbnails/17.jpg)
Cochlear Modelling
Sub-band filters are distributed logarithmically in frequency to approximate distribution of filters in cochlea
![Page 18: Linear Predictive Coding for Speech Compression Dev Ghosh ECE 463 9 March 2006.](https://reader035.fdocuments.in/reader035/viewer/2022072006/56649cf85503460f949c88af/html5/thumbnails/18.jpg)
Adaptive Noise Cancellation
LMS algorithm is used to model differential transfer function between noise signals in a number of sub-bands
Lower power and shorter filters used in each sub-band
Convergence is equal across all bands if power is distributed equally and filter lengths are the same
Convergence dominated by sub-band with greatest power