Dr. Shyamal Kumar Das Mandal Assistant Professor ...

147
Dr. Shyamal Kumar Das Mandal Assistant Professor [email protected] Centre for Educational Technology Indian Institute of Technology, Kharagpur DIGITAL SPEECH PROCESSING s1

Transcript of Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Page 1: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Dr. Shyamal Kumar Das MandalAssistant Professor

[email protected] for Educational Technology

Indian Institute of Technology, Kharagpur

DIGITAL SPEECH PROCESSING

s1

Page 2: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Slide 1

s1 sdm, 12/14/10

Page 3: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Speech Processing

SignalProcessing Information

Theory

Acoustics Phonetics and Articulatory Phonetics

Algorithms(Programming)

Fourier transformsDiscrete time filtersAR(MA) models

EntropyCommunication theoryRate-distortion theory

Statistical SPStochastic models

PsychoacousticsSpeech production

Page 4: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Speech Processing Model

Information Source

Measurement / Observation

Signal Representation

Signal Transformation

Extraction of Information

Utilization of Information

Human speaker—lots of variability

Acoustic waveform / articulatorypositions/neural control signals

Human listeners,machines

Signal Processing

Page 5: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Digitization and Recording of speech signal, Review of Digital SignalProcessing Concepts

Human Speech production, Acoustic Phonetics and ArticulatoryPhonetics, Different categories speech sounds and Location of sounds inthe acoustic waveform and spectrograms

Uniform Tube Modeling of Speech Production, Speech Perception

Time Domain Methods in Speech Processing, Analysis and Synthesis ofPole-Zero Speech Models

Short-Time Fourier Transform, Analysis:- FT view and Filtering view,Synthesis:-Filter bank summation (FBS) Method and OLA Method

Features Extraction and Extraction of Fundamental frequency

Speech Prosody, Speech Prosody Modeling (Fujisaki Model)

Overview of Speech based Applications development (TTS, ASR andspoken language acquisition)

Course coverage

Page 6: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Categories and labeling of different speech sound for agiven speech signal based on waveform and spectrographicview

Explain the psychoacoustic properties of speech perceptionand production

Design the Uniform tube model for speech sound productionand implement it based on discrete time modeling

Extract the fundamental frequency of speech signal basedon time domain and frequency domain method

Extract spectral parameters and time domain parameters ofspeech signal for speech technology application

Design an simple TTS and ASR system.

Explain the prosodic structure of spoken language anddesign F0 contour modeling based on Fujisaki Model

Course objective

Page 7: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Review of DSP Concepts

Page 8: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Concept of frequency in continuous-time and Discrete-time signal

ex tj

aAt )(

Continuous sinusoidal Time Signal

• For every fixed value of the frequency F x(t) is periodic

• Continuous time sinusoidal signal with distinctfrequencies are themselves distinct.

• Increase the frequency result in increase in the rate ofoscillation of the signalmore period are included.

tAtxacos)(

Complex exponent from

Page 9: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Discrete time Sinusoidal

)cos()( nAnx

A discrete time signal is periodic if its frequency f is arational number

Discrete time sinusoidal whose frequency areseparated by an integer multiple of 2 are identical

The highest rate of oscillation in a discrete timesinusoidal is attained when w= or(‐ ).

Page 10: Dr. Shyamal Kumar Das Mandal Assistant Professor ...
Page 11: Dr. Shyamal Kumar Das Mandal Assistant Professor ...
Page 12: Dr. Shyamal Kumar Das Mandal Assistant Professor ...
Page 13: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Classification of Discrete time signal

Energy signal and power signal: If E is the energy of a signal x[n]

Periodic and aperiodic signal: a signal x[n] is periodic with period N if and only if  x[n+N]=x[n] for all nThe smallest value of N for which holds the above equation is called fundamental period.If there is no value of N that satisfies the above equation then the signal is called aperiodic signal. 

n

nxE ][2

If E is finite then x(n) is called an energy signalMany signal possesses infinite energy, have a finite average power P. The average power define as:

N

NnNnx

NP ][lim

2

121

If P is finite and nonzero then the signal is called a power signal

Symmetric and antisymmetric signal: a real‐valued signal x[n] is called symmetric if  x[‐n]=x[n]On the other hand a signal x[n] is called antisymmetric if x[‐n]=‐x[n]

Page 14: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Discrete time system

Discrete time system is a device or algorithm that operate on a discrete timesignal, called the input according to some well define rule, to produceanother discrete time signal called output of the system.

y[n]=H[x[n]]y[n]=0.8y[n-1]+0.5x[n]+0.9x[n-1]

Z‐1

+

Z‐1

+

x[n]

y[n]

0.8

0.9

0.5

Page 15: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Classification of discrete system

• Static and Dynamic systemA discrete time system is called static or memory less it its output at anyinstant n depends at most on the input sample at the same time, but noton past or future samples of the input. In any other case the system issaid to be dynamic or to have memory.

• Time invariant and time variant system

A system is called time invariant if its input‐output characteristics do not change with time. 

A relaxed system H is time invariant or shift invariant if and only if x[n]  y[n]    implies that x[n‐k] y[n‐k]

If the output y[n-k] not = y[n-k] even for one value of k the system is time variant

H H

Page 16: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

•Linear and non-linear system: A linear system is one that satisfies thesuperposition principle. The principle of superposition requires that theresponse of the system to a weighted sum of signal is equal to thecorresponding weighted sum of the response of the system to each ofthe individual input signal.• Causal and non-causal system: A system is said to be causal if the output of the system at any time n depends only on present and past input, but not depend on future inputs.y[n]=F[x[n],x[n-1],x[n-2]……….x[n-k]] where F is any function If a system does not satisfy the above condition then the system is called Non-causal system.•Stable and unstable system: An arbitrary relaxed system is said to bebounded input-bounded output(BIBO) stable if and only if every boundedinput produces a bounded output. x[n], y[n] are bounded is simplytranslated mathematically to mean that there exist some finite numberssay Mx, My such that

Mxnx ][ Myny ][

for all n

Page 17: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Recursive and Non-recursive discrete system.

Cumulative average of signal x(n)

n

kkx

nny

0][

11][

][][][)1(1

0

nxkxnynn

k

][]1[ nxnny

+ x

x z‐1

y[n]

x[n]

][][][ knxkhnyk

n

1/n+1

A system whose output y[n] at time n depends on anynumber of past output values y[n-1], y[n-2] …. Is calledrecursive system

Page 18: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ConvolutionConvolution is one of the most frequently usedoperations in DSP. Specially in digital filteringapplications where two finite and causal sequencesx[n] and h[n] of lengths N1 and N2 are convolved

0

][][][][][][][kk

knxkhknxkhnxnhny

Page 19: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Operations involved

• Folding• Shifting• Multiplication• Summation

Page 20: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

)()()(

thxty

h()

x()

ty(t)

Page 21: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

h(-)

x()

ty(t)

)()()(

thxty

Page 22: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

)()()(

thxty

h(t-), t=0

x()

ty(t)

Page 23: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

)()()(

thxty

h(t-), t=1

x()

ty(t)

Page 24: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

y(t) x( )h(t

)

h(t-),t=2

x()

ty(t)

Page 25: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

y(t) x( )h(t

)

h(t-),t=7

x()

ty(t)

Page 26: Dr. Shyamal Kumar Das Mandal Assistant Professor ...
Page 27: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Circular convolution• Circular convolution of x(n) and h(n) is defined as the convolutionof h(n) with a periodic signal xp(n) :

nNnxnx p ,mod)(

][][][ nhnxny pp

where

1,.......1,0

][][][1

0

Nm

nmxnhnyN

nNpp

Page 28: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

22

Correlation

Correlation is a mathematical operation that is verysimilar to convolution. Just as with convolution,correlation uses two signals to produce a thirdsignal. This third signal is called the cross-correlation of the two input signals.

If a signal is correlated with itself, the resultingsignal is instead called the autocorrelation.

Page 29: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Correlation

,...2,1,0)()()(

,...2,1,0)()()(

lnylnxlr

llnynxlr

nxy

nxy

Where, rxy(l) is the correlation coefficients

Page 30: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Computation of correlation

)(lrxy

lM

ln

lnynx1

)()(

1

)()(N

ln

lnynx

MNl 0

1 NlMN

FOR l=1 to lmax{NL=M+1‐lIF(NL>N‐1) NL=NL‐1R(L)=0.0FOR(K=l TO NL{R(l)=R(l)+X(K)*Y(K‐l)}}

Page 31: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

25

Convolution vs. correlation

Convolution is the relationship between a system'sinput signal, output signal, and impulse response.

Correlation is a way to detect a known waveform ina noisy background.

The similar mathematics is only a convenientcoincidence.

Page 32: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

LTI System

Page 33: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Linear Time-Invariant Systems

Easiest to understand Easiest to manipulate Powerful processing capabilities Characterized completely by their response tounit sample, h(n), via convolution relationship Basis for linear filtering Used as models for speech production

Page 34: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Equivalent LTI Systems

Page 35: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Find y[n]?

x[n]

y[n]

Page 36: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

30

Direct Form I

• Transfer function of recursive LTI system

M

kk

N

kk knxbknyany

01

nvknyany

knxbnv

N

kk

M

kk

1

0

)(1

nxknwanwN

kk

M

kk knwbny

0

Page 37: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

31

Direct Form I

Page 38: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

32

Page 39: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

33

Frequency-Domain Representation of Discrete Signals and LTI Systems

LTI system( )h ncomplex‐valued 

exponencial signal

( ) j nx n e

impulse response

( ) ( ) ( )k

y n h k x n k

LTI system output

( )y n

Page 40: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

LTI system output:

( )( ) ( ) ( ) ( )

( ) ( )

j n k

k k

j k j n j n j k

k k

y n h k x n k h k e

h k e e e h k e

( ) ( )j n jy n e H e

Frequency response: ( ) ( )j j k

kH e h k e

Page 41: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

35

( )( ) ( )j j jH e H e e

( ) Re ( ) Im ( )j j jH e H e j H e

( ) ( )cos ( )sinj

k k

H e h k k j h k k

Re ( ) ( )cosj

k

H e h k k

Im ( ) ( )sinj

k

H e h k k

Page 42: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

36

Magnitude response:

2 2( ) Re ( ) Im ( )j j jH e H e H e

Im ( )( ) arg ( )

Re ( )

jj

j

H eH e arctg

H e

Phase response:

( )( ) dd

Group delay function:

Page 43: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

37

Comments on symmetry propertiesFor LTI systems with real-valued impulse response, the magnitude response, phase responses, the real component of and the imaginary component of

possess these symmetry properties:

The real component: even function of periodic with period

The imaginary component: odd function of        periodic with period 

( )jH e

2

2

Re ( ) Re ( )j jH e H e

Im ( ) Im ( )j jH e H e

Page 44: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

38

The magnitude response: even function of      periodic with period 

The phase response: odd function of       periodic with period 

( ) ( )j jH e H e

2

arg ( ) arg ( )j jH e H e

Consequence:

If we known and for , we can describe these functions ( i.e. also ) for all values of

( )jH e ( ) 0 ( )jH e

2

Page 45: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

39

( )jH e

24 3 2 3 4

24 3 2 3 4

Symmetry Properties

( )

EVEN

ODD

0

0

Page 46: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

40

Normalized Frequency

It is often desirable to express the frequency response of an LTI system in terms of units of frequency that involve sampling interval T. In this case, the expressions: 

( ) ( )j j k

kH e h k e

1( ) ( )

2j j nh n H e e d

are modified to the form:

( ) ( )j T j kT

kH e h kT e

/

/

( ) ( )2

Tj T j nT

T

Th nT H e e d

Page 47: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

41

is periodic with period , where is sampling frequency.

Solution: normalized frequency approach:

( )j TH e 2 / 2T F F/ 2F

/ 2 50F kHz 50kHz 3

1 3

20 10 2 0.450 10 5

xx

3

2 3

25 10 0.550 10 2

xx

100F kHz

1 20f kHz

2 25f kHz

Example:

Page 48: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Discrete Time Fourier Transform (DTFT)

Page 49: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Discrete Time Fourier Transform• Continuous time Fourier transform, when the signal is sampled: 

• Assuming• Discrete‐Time Fourier Transform (DTFT): 

DTFT is periodic in frequency with period of 2

Page 50: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Example

Page 51: Dr. Shyamal Kumar Das Mandal Assistant Professor ...
Page 52: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

DISCRETE FOURIER TRANSFORM (DFT)

Page 53: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

x[n] is the discrete signal FT is Xs(f)

x[n]

X(f)

Frequency domain sampling of the FTDFT

Page 54: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

eW Nj

N

/2

1

0][][

N

n

kn

NWnxkX

1

0][1][

N

k

kn

NWkXN

nx

DFT

IDFT

k=0,1,2,3…….N‐1

n=0,1,2,3…….N‐1

DFT is the set of N sample {X[k]} of the Fourier transform X() for a finite–duration sequence {x[n]} of length L<=N. the sampling of X() occurs atthe N equally spaced frequencies k=2k/N, k=0,1,2,3… N-1

Where WN is define as

Page 55: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

sincos je j

1

0][][

N

n

kn

NWnxkX

1

0

/2][N

n

Nknjenx

1

0)/2sin()/2][cos([

N

nNknjNknnx

f(k)=kfs/N

X[k]=Xreal[k]+j Ximag[k] ])[][][|][| 22 kkkkx XXX imgrealpower

][

][][ tan 1

k

kk

XXX

real

img

Page 56: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Properties of DET

Periodicity: if x[n] and X[k] are an N-point DFT then x[n+N]=x[n] for all nX[k+N]=X[k] for all k

Linearity: if

x1(n) X1(n)DFT

Nx2(n) X2(n)

DFT

N

a1x1(n)+a2x2(n) a1 X1(n)+a2 X2(n)DFT

N

Symmetry:

Page 57: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

DFT Shifting theorem:

)()( ln/2 lXl eX Nj

shifted

x[n] is shifted to the right by l sample

DFT Leakage:

DFT Resolution, Zero stuffing:DFT Magnitudes:When a real input signal contains a sine wave component of peak amplitudeA0 with an integral number of cycles over N input samples the outputmagnitude of the DFT for that particular sine wave is Mr where

Mr=A0*N/2If the DFT input is a complex sinusoid of magnitude A0 with an integral numberof cycles over N input samples the output magnitude of the DFT is Mc where

Mc=A0N

Page 58: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Fast Fourier Transform

let the computation of N=2v point DFT , split the N point datasequence into two N/2 point data sequence f1(n),f2(n)corresponding the even-number and odd-numberd samples of x(n)

f1(n)=x(2n), f2(n)=x(2n+1)

Thus f1(n) and f2(n) are obtained by decimating x(n) by a factorof 2 and hence the resulting FFT algorithm is called Decimating intime algorithm:

1

0)()(

N

n

kn

NWnxkX

oddn

kn

Nevenn

kn

N WW nxnx )()(

Page 59: Dr. Shyamal Kumar Das Mandal Assistant Professor ...
Page 60: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Decimation in Frequency

1

2/

12/

0)()()(

N

Nn

kn

N

N

n

kn

N WW nxnxkX

12/

0

2/12/

0)

2()(

N

n

kn

N

kN

N

N

n

kn

N WWW Nnxnx

)1(2/ kkN

NWNow

WkN

N

N

n

kNnxnxkX 2/

12/

0)2/()()( )1(

WkN

N

N

nNnxnxkX 2/

12/

0)2/()()2(

WkN

N

N

nNnxnxkX 2/

12/

0)2/()()12(

Page 61: Dr. Shyamal Kumar Das Mandal Assistant Professor ...
Page 62: Dr. Shyamal Kumar Das Mandal Assistant Professor ...
Page 63: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

)/2sin()/2cos(/2 NjNeW Nj

N

xC+j(‐S)

x0+jy0

x1+jy1 x’1+jy’1

x’0+jy’0

‐1

Page 64: Dr. Shyamal Kumar Das Mandal Assistant Professor ...
Page 65: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

C+j(‐S)

x0+jy0

x1+jy1 x’1+jy’1

x’0+jy’0

‐1x

Page 66: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Frequency spectrum

Page 67: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Spectrogram

Page 68: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Digital Filter

Page 69: Dr. Shyamal Kumar Das Mandal Assistant Professor ...
Page 70: Dr. Shyamal Kumar Das Mandal Assistant Professor ...
Page 71: Dr. Shyamal Kumar Das Mandal Assistant Professor ...
Page 72: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

A finite impulse response(FIR) filter is a discretelinear time-invariant system whose output is basedon the weighted summation of a finite number ofpast input.

1

0)()(

M

kk knxny b

Page 73: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

z‐1 z‐1 z‐1 z‐1

x x x x x

y(n)

x(n)

h(0) h(1) h(2) h(M‐2) h(M‐1)

1

0)()()(

M

kknxkhny

)1()1()2()2(.......)1()1()()0()( MnxMhMnxMhnxhnxhny

Page 74: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

)().()(*)()( mXmHnxkhny DFT

IDFT

X

IDFT DFT DFT DFT

Filter out put in frequency domain, Y(m)=H(m).X(m)

Filter out put in time domainy(n)=h(k)*x(n)

x(n)h(k)

Time domain

Frequencydomain

Page 75: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Algebraic  determination of time‐domain coefficients of low pass filter 

1. Develop an expression for the discrete frequency response H(m)

2. Apply that expression to the inverse DFT equation to get the time domain h(k)

3. Evaluate that h(k) expression as a function of time

Let Hd(w) is the frequency response of a low‐pass filter of time response hd(n)

ehH nj

ndd n

)()(

0

dn eHh nj

dd

)()(

The unit sample response obtained from the above equation is infinite induration and must be truncated at some point say n=M-1 for a FIR filter oflength M this truncation is equivalent to multiplying hd(n) by a windowfunction w(n).

h(n)= hd(n)w(n)

Page 76: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Window Function for FIR Filter Design

Name of Window Window functionBartlett(triangular)

Blackman

Hamming

Hanning

Kaiser

12

121

M

Mn

14cos08.0

12cos5.042.0

Mn

Mn

12cos46.054.0

Mn

12cos1

21

Mn

21

211

0

2

2

0

M

MnMI

I

Page 77: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Common Windows (Frequency)

Page 78: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ExampleLet the frequency response of a low‐pass filter as 

)(H de Mj 2/)1(1

0

c0

otherwise

A delay of (M‐1)/2 unit is incorporated into H(w) in anticipation of forcing the filter to be of length M

dn

c

c

ehMnjw

d

)

21(

21)(

21

)2

1(sin)(

Mn

Mnn

c

h

2

1,10

MnMn

/2

1c

Mh

Page 79: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Type of Window Approximate Transition width

of main Lobe

Peak Sidelobe

Rectangular 4/M -13

Bartlett 8/M -27

Hanning 8/M -32

Hamming 8/M -43

Blackman 12/M -58

Page 80: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Band-pass FIR filter design:

)().()( kkk shh shiftlpbp

High-pass FIR filter design:

)().()( kkk shh shiftlpbp

Page 81: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

IIR Design Methods

Impulse invariant transformation – match theanalog impulse response by sampling; resultingfrequency response is aliased version of analogfrequency response

Bilinear transformation – use a transformation tomap an analog filter to a digital filter by warping theanalog frequency scale (0 to infinity) to the digitalfrequency scale (0 to ); use frequency pre-warping to preserve critical frequencies oftransformation (i.e., filter cutoff frequencies)

Page 82: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Human Speech Production and

Source Filter model

Page 83: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Massage Planning

Speech Sound

Production

Rule of Grammar

Rule of Prosody

Physiological Constraints

Physiological Constraints

Input Information

LinguisticsLexical,

Syntactic, Semantic,Pragmatic

Para- LinguisticsIntentional, Attitudinal,

Stylistic

Non- LinguisticsPhysical, Emotional

Segmental and supra Segmental Features of

speechUtterance Planning

Motor Command Generation

Information manifestation in the segmental and suprasegmental features of speech

@ Fujisaki

Page 84: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

3

Silence Unvoiced

Voiced

State of the speech production source

Page 85: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Basics Definitions Speech is composed of a sequence of soundsSounds/Phonemes serve as a symbolicrepresentation of information to be shared betweenhumans (or humans and machines)

Arrangement of sounds is governed by rules oflanguage (constraints on sound sequences, wordsequences, etc)

Linguistics is the study of the rules of languagePhonetics is the study of the sounds of speech

Page 86: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Vocal tract —dotted lines in figure; beginsat the glottis (the vocal cords) and ends atthe lips• Consists of the pharynx (the connectionfrom the esophagus to the mouth) and themouth itself (the oral cavity)• Average male vocal tract length is 17.5 cm• Cross sectional area, determined bypositions of the tongue, lips, jaw and velum,varies from zero (complete closure) to 20 sqcmNasal tract — begins at the velum and endsat the nostrilsVelum —a trapdoor-like mechanism at theback of the mouth cavity; lowers to couplethe nasal tract to the vocal tract to producethe nasal sounds like /m/ (mom), /n/ (night),/ng/ (sing)

Page 87: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

MRI of Speech Human Production system (Prof. Shri Narayanan, USC)

Page 88: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

LUNG VOLUME

PHARYNX CAVITY

MOUTH CAVITY

NASAL CAVITY

Vocal Cords

LARYNY TUBE

TRACHEA TUBE

TOUNG HUMP

VELUM NOSE OUTPUT

MOUTH OUTPUT

Lungs and associated muscles acts as the source of air for exciting the vocal mechanism

1.When the vocal cords are tensed, the air flow causes them to vibrate, producing voiced sound.2. When the vocal cords are relaxed, in order to produce a sound the air flow either must pass through a constriction in the vocal tract and there by become turbulent, producing unvoiced sound or it can build up pressure behind a point of total closure within the vocal tract and when the closure is opened,the pressure is suddenly and abruptly released, causing a brief transient sound.

Schematic representation of the physiological mechanism of speech production

Page 89: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Page 90: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Bernoulli Oscillation Tensed Vocal Cords –Ready to Vibrate

Vocal Cords –Open for Breathing

Page 91: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Front

Back

Vocal Folds

Glottal slit

Cricoids Cartilage

Arytenoids Cartilage

Thyroid Cartilage

Breathing

Voiced

Unvoiced

Page 92: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Glottal Flow

Page 93: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

The Vocal Tract• The shape of the vocal tract transforms raw

sound from the vocal folds into recognizablesounds.

Page 94: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Page 95: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Page 96: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Women and Men

• The acoustics of male and female vowels differ reliablyalong two different dimensions:

1. Sound Source

2. Sound Filter

• Source--F0: Depends on length of vocal folds

Shorter in women higher average F0

Longer in men lower average F0

• Filter--Formants: Depend on length of vocal tract

shorter in women higher formant frequencies

longer in men lower formant frequencies

Page 97: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Page 98: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

17

To identify dissimilar sounds i.e., vowels, the ears are moresensitive to peaks in the signal spectrum. These resonantpeaks in the spectrum are called formants.

Formants are the characteristics partial that identify vowelsto the listeners.

Formant with lowest frequency is called F1, the second F2& the third F3. F1 & F2 are enough to disambiguate thevowel.

What is Formant??

Spectrographic view of vowel /i/

Page 99: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Page 100: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Articulatory and Acoustic phonetics

Page 101: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

20

Manners and Place of Articulation

Place of articulation: During the articulation the airstreamsthrough the vocal tract must be obstructed in some way. Theplace where the obstruction takes place is called the placeof articulation

Manner of articulation: Manner of articulation is concernedwith airflow ; the paths it take and the degree to which it isimpeded by vocal tract constrictions.

The consonants are classified depending on the place ofobstruction and manner of articulation./k/ Velar Un-aspirated unvoiced stopVowel sound specified in terms of the position of the tongueand the position of the lips.

/i/ High front Un-rounded

Page 102: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP21

If the glottis are closed then it isvoiced and if opened then it isunvoiced or voiceless.

Manners of Articulation due to State of the Glottis

Page 103: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Place of articulationa. Bilabial: Bilabial sounds are produced when the two lips make the

constriction

b. Labiodentals: These sounds are produced by contacting lower lipwith the upper teeth.

c. Dental: Dental sounds are produced by the constriction of tip or bladeof the tongue with the upper teeth.

d. Alveolar: The sound made by the tip or the blade of the tongue incontact against the alveolar ridge, which is the bony prominenceimmediately behind the upper teeth.

e. Post alveolar: The sound, which is articulated by the tip or the bladeof the tongue with the back area of the alveolar ridge.

f. Retroflex : Retroflex sounds are made when the tip of the tonguecurled back in the direction of the front part of the hard palate- in otherwords just behind the alveolar ridge. Depending on how far the tonguecurls back, retroflexed could be apico-postalveolar or apico-palatal.

Page 104: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

g. Palatal: This sound is produced when theconstriction is made by the front part of thetongue with the hard palate.

h. Velar: It refers to a sound made by the back ofthe tongue against the soft palate.

i. Uvular: This sound is produced when the backof the tongue touches the uvula.

j. Pharyngeal: It refers to a sound produced inthe pharynx, the tubular cavity, whichconstitutes the throat above the larynx.

k. Glottal: These are the sounds, which made inthe larynx due to the closure or narrowing ofthe glottis.

Page 105: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Manner of articulation

a) Plosive, or stopb) Nasal stopc) Fricatived) Affricatee) Lateralf) Approximantg) Trill:h) Flap and Tap 1. Voiced

2. Unvoiced3. Aspiration

Page 106: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

25

• Vowels: a) Oral vowels b) nasal vowels

• Dipthongs: Dipthongs is a gliding monosyllabic speech soundthat start at or near the articulatory position for onevowel and moves to or toward the position foranother

• Semivowels: Semivowels are vowel like nature. They are generally characterized by gluding transition in vocal tract area function between adjacent phonemes.

• Consonant: a. Nasal consonants. b. unvoiced fricatives.

c. Voiced fricative d. voiced and unvoiced stop/ Plosive

Classification of sound in linguistically distinct speech (phonemes)

Page 107: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Page 108: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Page 109: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Page 110: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

The Vowel Space

Page 111: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Different Vowels, Different Formants

• The formant frequencies of resemble the resonant frequencies of a tube that is open at one end.

• For the average man (ref: Peter Ladefoged):

• F1 = 500 Hz

• F2 = 1500 Hz

• F3 = 2500 Hz

• However, we can change the shape of the vocal tract to get different resonant frequencies.

• Vowels may be defined in terms of their characteristic resonant frequencies (formants).

Page 112: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Articulatory description of Vowels

Vowels have traditionally been described according to following pseudo-articulatory parameters:

1. Height (of tongue) (F1)

2. Front/Back (of tongue)(F2)

3. Rounding (of lips)

Page 113: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Lip rounding

Page 114: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Velum closed

Vocal Cord Open

Nasal PassagePlace of

Articulation (Velar)

Back of tongue (Articulator)

Page 115: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Vocal Cord Closed

Velum closed

Place of Articulation

(Velar)

Back of tongue (Articulator)

Page 116: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Vocal Cord Open

Velum closed

Nasal PassagePlace of Articulation (Post-alveolar)

Tongue tip curled back (Articulator)

Page 117: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Vocal Cord Closed

Velum closed

Nasal PassagePlace of Articulation (Post-alveolar)

Tongue tip curled back (Articulator)

Page 118: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Upper palate

Velum closed

Vocal Cords Open

Tongue Tip (Articulator)

Place of Articulation (Dental)

Page 119: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Upper palate

Velum closed

Vocal Cords Open

Both the Lips (Articulator)

Place of Articulation

(Bilabial)

Page 120: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Velum closed

Vocal Cords Open

Place of Constriction

(Post alveolar)

Page 121: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Velum closed

Vocal Cords Open

Place of Constriction

(Post alveolar)

Page 122: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Place of Constriction

(Post alveolar)

Place of release (Alveolar)

Velum closed

Vocal Cords Open

Page 123: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Place of Articulation (Bilabial)

Vocal Cords Closed

Velum Open

Page 124: Dr. Shyamal Kumar Das Mandal Assistant Professor ...
Page 125: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

EnglishBengal

Page 126: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP08/21/17 45

Page 127: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP08/21/17 46

Page 128: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP08/21/17 47

Page 129: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP08/21/17 48

Page 130: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP08/21/17 49

Page 131: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP08/21/17 50

Page 132: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP08/21/17 51

Page 133: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP52

Page 134: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP08/21/17 53

Page 135: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

Time Domain Shape

Page 136: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

55

F1 & F2 are primarily determined by the position oftongue. F1 has a higher frequency when the tongue islowered and F2 has a higher frequency when the tongueis forwarded.

Vowels are classified according to the height andposition of the tongue inside the mouth.

Position of Bangla Vowels in Cardinal Vowel Diagram

Classification of vowels

Bangla vowels

Page 137: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP08/21/17 56

Vowel-vowel Combination

Semi-vowel or Glide

Hiatus

Diphthong

Page 138: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP08/21/17 57

A. In continuous speech two vowels can come togetherin two different situations.

B. They may be in a single word.

C. They may be part of two adjacent words i.e., oneword ends with a vowel and the next word starts witha vowel.

D. If the two vowels are within a single word, they mayeither be in two distinct syllables, or may merge intoone syllable.

Vowel-Vowel Combination

Back

Examples :

Page 139: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP5808/21/17

A diphthong is a monosyllabic vowel combinationinvolving a quick but smooth movement from onevowel to another, often interpreted by listeners as asingle vowel sound or phoneme.

It is a sequence of two different or same vowels thatare part of a single syllable. Usually one of the vowelsis stronger than the other.

.Examples:

Diphthong

Bangla Word :Bangla Word :

Page 140: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP08/21/17 59

When two vowels coming together without any contraction orelision are pronounced separately as distinct from Diphthongs they aretermed as hiatus.

Hiatus may be of two types:

1) Internal Hiatus which occurs within a word.

Example:

Hiatus

2) External Hiatus which refers to the break between twosuccessive words. In this situation the first word ends with a vowel andthe second word starts with a vowel.

Example:

Bangla Word :

Bangla Sentence :

Page 141: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP6008/21/17

Semi-vowel refers to a sound functioning as a consonant butlacking the PHONETIC characteristics normally associated withconsonants.

Its QUALITY is phonetically that of a vowel; though itsDURATION is much less than that typical of vowel.

Examples:

Semi-vowel or Glide

Bangla Word :

Bangla Word :

Bangla Word :

Page 142: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP6108/21/17

Steady State Duration of / a/

Transition with semi-vowel ( j)

Semi-vowel after a vowelVowel-semivowel combination (V-j) consists of transitional durationwith semivowel along with the preceding vowel’s steady stateduration.

PlaySpectrographic View of Bangla Word with V-j combination

Page 143: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP62

Steady State Duration of

/ o/

Steady State Duration of

/ o/

Transition with Semivowel / j/

Play

Vowel-semivowel-vowel combination consists of transitional durationwith semivowel along with the preceding and succeeding vowels’ steadystate duration.

Semi-vowel in between two vowels

Spectrographic View of Bangla Word ( ) with V-j-V combination

Page 144: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Page 145: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

Consonents Manner of ArticulationS/N

Place of Articulation

Unvoiced VoicedUn-

AspiratedAspirated Un-

AspiratedAspirated

1 Velar

Stop

/k/ /kh/ /g/ /gh/

2 Post-alveolar (Retroflex ) /ʈ/ /ʈʰ/ /ɖ/ /ɖʰ/

3 Dental /t/ /th/ /d/ /dh/

4 Bilabial /p/ /ph/ /b/ /bh/

5 Alveolar -Post alveolar Affricate /ʧ/ /ʧʰ/ /ʤ/ /ʤh/

6 Alveolar

Fricative

/s/

7 Post alveolar /ʃ/

8 Glottal /h/ //

9 Velar

Nasal Murmur

/ŋ/

10 Palatal /ɳ/

11 Dental /n/

12 Bilabial /m/

Page 146: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

S/N

Place of Articulati

on

Manner of ArticulationUnvoiced Voiced

Un-Aspirated

Aspirated Un-Aspirated

Aspirated

13 Dental Lateral /l/

14 Alveolar Trill /r/

15 Post alveolar

Retroflex Flap /ɽ/ /ɽh/

16 Palatal Approximant

/j/

17 Bilabial /w/

Vowel1 Back vowel Close, Rounded /u/

2 Back vowel Close-mid, Rounded /o/

3 Back vowel Open, Rounded /ɔ/

4 Front vowel Open, Unrounded /a/

5 Front vowel Open-mid, Unrounded /æ/

6 Front vowel Close-mid, Unrounded /e/

7 Front vowel Close, Unrounded /i/

Page 147: Dr. Shyamal Kumar Das Mandal Assistant Professor ...

ET60007 © CET, IITKGP

TUTORIAL

1.Write the place and manner of articulation of the following phoneme/k/, /g/,/u/,/gh/, /ɽ/, /ʃ/

2. Write out the phonetic transcription for the following words:/she/, /phonetic/, /marks/, /speech/,

How many syllable is present in each of the above word.

3. Draw Schematic representation of the physiological mechanism of speech productionsystem and explain how the a voiced sound is produce.

4. A voiced operated lift operation is designee using the following wordsa. stop, b. up, c. down d. floor e. first f. second g. third h. fourth and i. ground.

Figure 1 shows wideband spectrograms of one version of each of these words. Using your knowledge of acoustic phonetics, determine which wideband spectrogram corresponds to which word.

5. The following waveform is for the utterance /kolkata/ and the waveform samples are at asampling rate of FS =22050 Hz. Segment the waveform into regions of "Voiced Speech (V)"and "Non-Voiced Speech (N)".

6. Which formant frequency is related to tongue height and which formant related to tougueposition

7. Why the child speech has high F0 and formant compare to a adult