Speech & Audio Processing

Post on 25-Feb-2016

69 views 4 download

Tags:

description

Speech & Audio Processing. Speech & Audio Coding Examples. A Simple Speech Coder. LPC Based Analysis Structure. Linear Prediction Analysis. Levinson- Durbin. Pre- emphasis. Windowing Analysis. Auto- Correlation. Audio Input. Residual. Residual. Analysis Filter. Quantization. - PowerPoint PPT Presentation

Transcript of Speech & Audio Processing

Speech & Audio Processing

Speech & Audio Coding Examples

April 22, 2023 Veton Këpuska 2

A Simple Speech Coder LPC Based Analysis Structure

Pre-emphasis

WindowingAnalysis

Auto-Correlation

Levinson-Durbin

Linear Prediction Analysis

AudioInput

AnalysisFilter

Residual

Filter Coeffs

Residual

Filter CoeffsQu

antiz

atio

n

April 22, 2023 Veton Këpuska 3

Windowing Analysis StageN – Length of the Analysis Window10-30 msec

April 22, 2023 Veton Këpuska 4

Some Analysis Windows

April 22, 2023 Veton Këpuska 5

MATLAB Useful Functions wintool

Use “doc wintool” for more information window

Use “>doc window” for the list of supported windows Define your own window if needed e.g:

Sine window and Vorbis window

windowvorbis5.0sin2

sin

windowsine5.0sin

2

Nnnw

Nnnw

April 22, 2023 Veton Këpuska 6

LPC Analysis Stage LPC Method Described in:

Ch5-Analysis_&_Synthesis_of_Pole-Zero_Speech_Models.pptx

Summary: Perform Autocorrelation Solve system of equations with Durbin-

Levinson Method MATLAB help

doc lpc, etc.

April 22, 2023 Veton Këpuska 7

Example of MATLAB Codefunction myLPCCodec(wavfile, N)%% wavfile - input MS wav file % N - LPC Filter Order%

zAzH 1 ŝ[n]

p

kk ngeknsns

1

ˆˆ

ImpulseTrain

NoiseGenerator

VoicedUnvoiced

Vocal TractModel

Gain

April 22, 2023 Veton Këpuska 8

Analysis of Quantization Errors Use MATLAB functions to research the effects of

quantization errors introduced by precision of the arithmetic operations and representation of the filter and error signal: Double (float64) representation (software emulation) Float (float32) representation (software emulation) Int (int32) representation (hardware emulation) Short (int16) representation (hardware emulation).

Useful MATLAB functions: Fix, floor, round, ceil Example:

sig_hat=fix(sig*2^(B-1))/2^(B-1); Truncation of the sig to B bits.

April 22, 2023 Veton Këpuska 9

Quantization of Error Signal & Filter Coefficients Can Apply ADPCM for Error Signal Filter Coefficients in the Direct Filter Form

are found to be sensitive to quantization errors: Small quantization error can have a large effect

on filter characteristics. Issue is that polynomial coefficients have non-

linear mapping to poles of the filter (e.g., roots of the polynomial).

Alternate representations possible that have significantly better tolerance to quantization error.

April 22, 2023 Veton Këpuska 10

LPC Filter Representations As noted previously when Levinson-Durbin algorithm was

introduced one alternate representation to filter coefficients was also mentioned: PARCOR coefficients:

LPC to PARCOR:

111

21 11

1

11

1

iii

i

iji

ii

iji

j

jpj

ak

ijkaaa

a

,,p,pifor

pja

April 22, 2023 Veton Këpuska 11

PARCOR Filter Representation PARCOR to LPC:

pja

ijakaa

ka

,pifor

pjj

ijii

ij

ij

iii

1

11

,2,1

11

April 22, 2023 Veton Këpuska 12

Line Spectral Frequency Representation It turns out that PARCOR coefficients can be represented

with LSF that have significantly better properties. Note that:

The PARCOR lattice structure of the LPC synthesis filter above:

zAzH 1

z-1

kp-

+

z-1

kp-1

+

-z-1 k 0=

-1

Input OutputA0Ap-1Ap

B0Bp-1Bp

k p+1

=∓1

April 22, 2023 Veton Këpuska 13

Line Spectral Frequency Representation From previous slide the following holds:

From this realization of the filter the LSP representation is derived:

& & 1

11

100

111

11

zAzzB

zzBzA

zAkzBzzB

zBkzAzA

pp

p

pppp

pppp

April 22, 2023 Veton Këpuska 14

LSF Representation

zQzPzA

zBzAzQk

zBzAzPk

ppp

pppp

pppp

11

11

11

21

1

1

April 22, 2023 Veton Këpuska 15

LPC Synthesis Filter with LSF

11211

111

11

11

zQzP

zAzAzH

pp

April 22, 2023 Veton Këpuska 16

A Simple Speech Coder LPC Based Synthesis Structure

Residual SynthesisFilter

AudioOutput

Filter Coeffs

De-emphasis

Deco

ding

ResidualSignal

FilterCoeffs

Audio Coding

April 22, 2023 Veton Këpuska 18

Audio Coding Most of the Audio Coding Standards use

principles of Psychoacoustics. Example of Basic Structure of MP3

encoder:

Filterbank &Transform

Quantization

PsychoacousticModel

AudioInput Bit-stream

April 22, 2023 Veton Këpuska 19

Basic Structure of Audio Coders Filterbank Processing Psychoacoustic Model Quantization

Filter Bank Analysis Synthesis

April 22, 2023 Veton Këpuska 21

Filterbank Processing: Splitting full-band signal into several sub-

bands: Uniform sub-bands (FFT) Critical Band (FFT followed by non-linear

transformation) Reflect Human Auditory Apparatus. Mel-Scale and Bark-Scale transformations

2

7500arctan*5.3*00076.0arctan*13

7001ln*01048.1127

ffBark

fMel

April 22, 2023 Veton Këpuska 22

Mel-Scale

April 22, 2023 Veton Këpuska 23

Bark-Scale

April 22, 2023 Veton Këpuska 24

Analysis Structure of Filterbank

hk[n]

AudioInput

hN[n]

h1[n]

MDCT

MDCT

MDCT

hk[n] – Impulse Response of a Quadrature Mirror kth-filterN – Number of Channels. Typically 32↓ - Down-samplingMDCT – Modified Discrete Cosine Transform

MDCT

MDCT

MDCT

Quan

tizat

ion

Bit Stream

April 22, 2023 Veton Këpuska 25

MDCT

MDCT

MDCT

Analysis Structure of Filterbank

IMDCT AudioOutput

IMDCT

IMDCT

gk[n]

gN[n]

g1[n]

gk[n] – Impulse Response of a Inverse Quadrature Mirror kth-filter N – Number of Channels. Typically 32↑ - Up-samplingIMDCT – Inverse Modified Discrete Cosine Transform

Deco

ding

Bit Stream

Psycho-Acoustic Modeling

April 22, 2023 Veton Këpuska 27

Psychoacoustic Model Masking Threshold according to the

human auditory perception. Masking threshold is used to quantize

the Discrete Cosine Transform Coefficients

Analysis is done in frequency domain represented by DFT and computed by FFT.

April 22, 2023 Veton Këpuska 28

Threshold of Hearing Absolute threshold of audibly

perceptible events in quiet conditions (no other sounds).

Any signal bellow the threshold can be removed without effect on the perception.

April 22, 2023 Veton Këpuska 29

Threshold of Hearing

April 22, 2023 Veton Këpuska 30

Frequency Masking Schröder Spreading Function Bark Scale Function:

21

210

ker

2

474.015.17474.05.781.15log*10

7500arctan*5.3*00076.0arctan*13

zzzF

fzfzz

fffz

masmaskee

April 22, 2023 Veton Këpuska 31

Masking Curve

April 22, 2023 Veton Këpuska 32

Primary Tone 1kHz

April 22, 2023 Veton Këpuska 33

Masked Tone 900 Hz

April 22, 2023 Veton Këpuska 34

Combined Sound 1kHz + 0.9kHz

April 22, 2023 Veton Këpuska 35

Combined 1kHz + 0.9kHz (-10dB)

April 22, 2023 Veton Këpuska 36

Combined 1kHz + 5kHz (-10dB)

April 22, 2023 Veton Këpuska 37

END