Speech & Audio Processing

Speech & Audio Coding Examples

April 22, 2023 Veton Këpuska 2

A Simple Speech Coder LPC Based Analysis Structure

Pre-emphasis

WindowingAnalysis

Auto-Correlation

Levinson-Durbin

Linear Prediction Analysis

AudioInput

AnalysisFilter

Residual

Filter Coeffs

Residual

Filter CoeffsQu

Windowing Analysis StageN – Length of the Analysis Window10-30 msec

Some Analysis Windows

MATLAB Useful Functions wintool

Use “doc wintool” for more information window

Use “>doc window” for the list of supported windows Define your own window if needed e.g:

Sine window and Vorbis window

windowvorbis5.0sin2

windowsine5.0sin

LPC Analysis Stage LPC Method Described in:

Ch5-Analysis_&_Synthesis_of_Pole-Zero_Speech_Models.pptx

Summary: Perform Autocorrelation Solve system of equations with Durbin-

Levinson Method MATLAB help

doc lpc, etc.

Example of MATLAB Codefunction myLPCCodec(wavfile, N)%% wavfile - input MS wav file % N - LPC Filter Order%

zAzH 1 ŝ[n]

kk ngeknsns

ImpulseTrain

NoiseGenerator

VoicedUnvoiced

Vocal TractModel

Analysis of Quantization Errors Use MATLAB functions to research the effects of

quantization errors introduced by precision of the arithmetic operations and representation of the filter and error signal: Double (float64) representation (software emulation) Float (float32) representation (software emulation) Int (int32) representation (hardware emulation) Short (int16) representation (hardware emulation).

Useful MATLAB functions: Fix, floor, round, ceil Example:

sig_hat=fix(sig*2^(B-1))/2^(B-1); Truncation of the sig to B bits.

Quantization of Error Signal & Filter Coefficients Can Apply ADPCM for Error Signal Filter Coefficients in the Direct Filter Form

are found to be sensitive to quantization errors: Small quantization error can have a large effect

on filter characteristics. Issue is that polynomial coefficients have non-

linear mapping to poles of the filter (e.g., roots of the polynomial).

Alternate representations possible that have significantly better tolerance to quantization error.

LPC Filter Representations As noted previously when Levinson-Durbin algorithm was

introduced one alternate representation to filter coefficients was also mentioned: PARCOR coefficients:

LPC to PARCOR:

ijkaaa

,,p,pifor

PARCOR Filter Representation PARCOR to LPC:

ijakaa

,pifor

Line Spectral Frequency Representation It turns out that PARCOR coefficients can be represented

with LSF that have significantly better properties. Note that:

The PARCOR lattice structure of the LPC synthesis filter above:

zAzH 1

-z-1 k 0=

Input OutputA0Ap-1Ap

B0Bp-1Bp

Line Spectral Frequency Representation From previous slide the following holds:

From this realization of the filter the LSP representation is derived:

zAkzBzzB

zBkzAzA

LSF Representation

zQzPzA

zBzAzQk

zBzAzPk

LPC Synthesis Filter with LSF

zAzAzH

A Simple Speech Coder LPC Based Synthesis Structure

Residual SynthesisFilter

AudioOutput

Filter Coeffs

De-emphasis

ResidualSignal

FilterCoeffs

Audio Coding

Audio Coding Most of the Audio Coding Standards use

principles of Psychoacoustics. Example of Basic Structure of MP3

encoder:

Filterbank &Transform

Quantization

PsychoacousticModel

AudioInput Bit-stream

Basic Structure of Audio Coders Filterbank Processing Psychoacoustic Model Quantization

Filter Bank Analysis Synthesis

Filterbank Processing: Splitting full-band signal into several sub-

bands: Uniform sub-bands (FFT) Critical Band (FFT followed by non-linear

transformation) Reflect Human Auditory Apparatus. Mel-Scale and Bark-Scale transformations

7500arctan*5.3*00076.0arctan*13

7001ln*01048.1127

ffBark

Mel-Scale

Bark-Scale

Analysis Structure of Filterbank

AudioInput

hk[n] – Impulse Response of a Quadrature Mirror kth-filterN – Number of Channels. Typically 32↓ - Down-samplingMDCT – Modified Discrete Cosine Transform

Bit Stream

Analysis Structure of Filterbank

IMDCT AudioOutput

gk[n] – Impulse Response of a Inverse Quadrature Mirror kth-filter N – Number of Channels. Typically 32↑ - Up-samplingIMDCT – Inverse Modified Discrete Cosine Transform

Bit Stream

Psycho-Acoustic Modeling

Psychoacoustic Model Masking Threshold according to the

human auditory perception. Masking threshold is used to quantize

the Discrete Cosine Transform Coefficients

Analysis is done in frequency domain represented by DFT and computed by FFT.

Threshold of Hearing Absolute threshold of audibly

perceptible events in quiet conditions (no other sounds).

Any signal bellow the threshold can be removed without effect on the perception.

Threshold of Hearing

Frequency Masking Schröder Spreading Function Bark Scale Function:

474.015.17474.05.781.15log*10

7500arctan*5.3*00076.0arctan*13

masmaskee

Masking Curve

Primary Tone 1kHz

Masked Tone 900 Hz

Combined Sound 1kHz + 0.9kHz

Combined 1kHz + 0.9kHz (-10dB)

Combined 1kHz + 5kHz (-10dB)

Speech & Audio Processing

Documents

Transcript of Speech & Audio Processing

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING

Speech and Audio Processing and Coding (cont.)

Information for Speech Recognition Joint Processing of ... · Joint Processing of Audio and Visual Information for Speech Recognition ... speech understanding, speech synthesis, ...

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, … TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 2, FEBRUARY 2014 493 Bayesian Nonparametrics for Microphone Array Processing

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1

14 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING…tabus/2013GhidoTabus.pdf · 2013-11-16 · 14 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21,

Audio-visual processing of speech with DNN...Audio-visual processing of speech with DNN Ido Ariav Electrical Engineering Department Technion - Israel Institute of Technology Supervised

Digital signal processing IC for speech and audio · PDF fileDigital signal processing IC for speech and audio applications ... Order code Package Packing ... 8 Package information

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, …dtyeh/papers/yeh12_taslp.pdf · 2017-02-15 · IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 18, NO. 3, MARCH 2011 1

Information for Speech Recognition Joint Processing of ... Speech Recognition ... speech onset cues with audio-based speech energy Audio-Visual Speech synthesis ... speech recognition.

96 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING

Audio-Visual Speech Processing: Progress & Challenges

1: Audio and Acoustic Signal Processing · ... Audio and Acoustic Signal Processing ... Speech Recognition 5.4.5*: Speech Synthesis ... Speech, and Audio 5.5.2*: Image and Video Applications

Audio Pre-processing and Speech Recognition for Broadcast News

Audio/Speech Signal Processing An Overview - IIT Kanpurhome.iitk.ac.in/~nnaik/pdf/PPT_AudioSpeech.pdf · Signal Processing Tasks •Audio/Speech Encoding/Decoding - Codecs ( DFT –Spectral

RASTA processing of speech - Speech and Audio …labrosa.ee.columbia.edu/~dpwe/papers/HermM94-rasta.pdf · Title: RASTA processing of speech - Speech and Audio Processing, IEEE Transacti

166 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING

Audio-Visual Speech Processing: Progress & Challengesusers.cecs.anu.edu.au/~vishci/VisHCI2006/Papers/VisHCI... · 2007. 7. 11. · • Why audio-visual speech in human-computer interaction.

RASTA processing of speech - Speech and Audio Processing ...dpwe/papers/HermM94-rasta.pdf · Title: RASTA processing of speech - Speech and Audio Processing, IEEE Transacti ons on

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, … · 2005. 7. 24. · IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 12, NO. 6, NOVEMBER 2004 561 Speech Enhancement Based

1: Audio and Acoustic Signal Processing · ... Audio and Acoustic Signal Processing ... Speech Recognition 5.4.5: Speech Synthesis ... Speech, and Audio 5.5.2: Image and Video Applications