D.Sc. Jarmo Malinenspeech.math.aalto.fi/pdf/ClinPhon.pdf · 2018. 5. 31. · Glottal pulse...

Glottal pulse modelling for vowel simulations

D.Sc. Jarmo Malinen

Aalto University, School of Science,Department of Mathematics and Systems Analysis

University of HelsinkiMarch, 13th, 2015

Human voice productionSimplified vowel production:

Mouth︸︷︷︸(exterior load)

↑Vocal tract︸︷︷︸

(filter)

↑Vocal folds︸︷︷︸

(source)

Flanagan, J. L. (1972). Speech Analysis Synthesis and Perception,Springer-Verlag.

• Vocal tract (VT) shape changes, and there are feedbacks.

• Not all speech sounds originate in vocal folds.

Purpose of the glottal pulse model

The glottal pulse model is designed for two purposes:

• As an adaptable and tunable glottal source forhigh-resolution acoustic model of vowel production “DICO”.

• As a development platform for speech processing algoritms.The glottal inverse filtering (GIF) is an example of this.

DICO is based on computational acoustics and MRI data.It has been originally designed for predicting phonetic outcomes ofmaxillofacial surgery.

Modelling speech requires data

Simultaneous speech recording during 3D MR imaging.

MRI is required for modelling.Speech signal is required for model validation.

VT surfaces and area functions from MRI

The geometric data for acoustic modelling is obtained fromMagnetic Resonance Images by custom segmentation software.

Geometries of Finnish vowels

Vocal folds

Vocal folds vibrate due to a self-sustained aerodynamic cut-offeffect as a result of air flow from lungs.

Vocal tract as Webster’s resonator

Equations for the Webster’s velocity potential ψ = ψ(s, t):ψtt = c(s)2

A(s)∂∂s

(A(s)∂ψ∂s

)− 2παW (s)c(s)2

A(s)∂ψ∂t in vocal tract s ∈ [0, `]

ψ(`, t) = 0 at mouth s = `

−cψs(0, t) + ψt(0, t) = 2√

cρA(0) u(t) on vocal folds s = 0.

c speed of sound

ρ density of air

α dissipation to tissues

A(0) Intersection areajust above glottis

A(s) VT area functionfrom MRI

Both Σ(s) and W (s) are corrections due to VT curvature.

Drawbacks of Webster’s resonator

A high-resolution vocal tract acoustics model cannot be based onWebster’s resonator:

• For three lowest formants under 4 kHz, Webster’s resonator isvery accurate and numerically efficient.

• Webster’s resonator does not see cross-mode resonances inmouth cavity for, e.g., [A].

• The resonance structure of valleculae and piriformis sinusesshow up at 4+ kHz as a F4-cluster. Not accounted byWebster’s resonator at all.

However, Webster’s resonator is sufficient for producing glottalpulse for a separate high-resolution model.

A simple source-filter model (1)Coupling the glottis and flow model to Webster’s resonator, we geta simple vowel synthesizer.

It does give (some kind of) vowels with correct formants, but doesnot really explain speech biophysics at all.

A simple source-filter model (2)The vocal folds interact with the glottal flow in both directions.

This feedback produces sustained oscillations heard as sound.

Adding acoustic feedbackThe next improvement is to let the vocal tract sound pressureaffect the vocal fold vibrations.

This feedback is called source-filter coupling by I. Titze.

Adding acoustic feedbackEven though lower airways under glottis are not actively used inspeech, they do load the vocal folds in a similar manner as thevocal tract.

Now we have a fairly complete glottal pulse generator based onspeech biophysics and geometric data from MRI.

Multiphysics of vowel production: DICOThe full model feeds the glottal pulse to a high-resolution 3Dacoustic model to obtain all the details.

Resonators TerminationsCouplingsVocal folds Glottal Flow

Tuning to

isospectrality

Vocal tract resonator

by the Wave Equation

MRI and

post-processing

- surface models

- area functions

- centrelines

Subglottal

Webster resonator

Vocal tract

Webster resonator

Perturbation

velocity at glottis

Exterior

acoustics

Lung

impedance

Mouth

impedance

Constricted

ow

Aero-

dynamic force

Subglottal

pressure

Low-order

mass-spring

system with

- tissue losses,

- Hertz impact

model for

vocal folds

collisions

Bernoulli ow with

- Hagen-Poiseulle

viscous pressure loss,

- a term for turbulence

losses

Glottal pulse generator

k

Counter pressure

Counter pressure

3D acoustic simulator

Extension

to control

surface

Compromise between accuracy and numerical efficiency.

Environmental acoustics

Even in otherwise empty space,the face of the speaker acts asa reflecting surface.

Speech in constrained spacediffers from speech in open space.

Speech data recorded insidethe MRI coil is like singing ina bucket.

Dealing with the environmentshould be done in a numericallyefficient way as it is, after all,a secondary feature.

Simulated glottal signals for [A, i]

0 0.005 0.01 0.015 0.020

5x 10

−4

U (

m3/s

)

0 0.005 0.01 0.015 0.020

5x 10

−6

Ag (

m3)

0 0.005 0.01 0.015 0.02−4000

0

4000

psp (

Pa

)

0 0.005 0.01 0.015 0.02−4000

0

4000

pc (

Pa

)

0 0.005 0.01 0.015 0.02−1000

0

1000

psb (

Pa

)

Time (s)

0 0.005 0.01 0.015 0.020

5x 10

−4

U (

m3/s

)

0 0.005 0.01 0.015 0.020

1x 10

−5

Ag (

m3)

0 0.005 0.01 0.015 0.02

−2000

0

2000

psp (

Pa)

0 0.005 0.01 0.015 0.02

−2000

0

2000

pc (

Pa)

0 0.005 0.01 0.015 0.02

−2000

0

2000p

sb (

Pa)

Time (s)

Signals in black are computed without subglottal resonator.

Technological side products

The novel technological solutions may have value that isindependent of the modelling work:

• Segmentation and registration of MRI material from VTcan be used for other purposes: e.g., numerical studies of flowmechanics in upper airways, related to Obstructive SleepApnea or asthma.

• High-quality speech sampling during dynamic MRI forstudies of articulation, singing, etc.

• MRI experiments during prolonged vowel utterances.

Almost all of the potential applications are uncharted territoryright now... and we are interested in cooperation.

The End is Nigh

Thanks for your patience.

Any questions?

http://speech.math.aalto.fi

Joint work of groups in Aalto University, University of Helsinki, Universityof Turku, Turku University Central Hospital, and Medical Imaging Centreof South-West Finland.

http://speech.math.aalto.fi

“Opera magna”

A. Hannukainen, T. Lukkari, J. Malinen, and P. Palo.

Vowel formants from the wave equation. Journal of the Acoustical Society of America, 122(1):EL1–EL7,2007.

A. Aalto, D. Aalto, J. Malinen, T. Murtola, and M. Vainio.

Modal locking between vocal fold and vocal tract oscillations: Simulations in time domain. arXiv:1211.4788,2012–2015 (and still going on and on).

A. Aalto and J. Malinen.

Composition of passive boundary control systems. Mathematical Control and Related Fields, 3(1):1–19,2013.

T. Lukkari and J. Malinen.

Webster’s equation with curvature and dissipation. arXiv:1204.4075, 2013.

D. Aalto, O. Aaltonen, R.-P. Happonen, P. Jaasaari, A. Kivela, J. Kuortti, J. M. Luukinen, J. Malinen,

T. Murtola, R. Parkkola, J. Saunavaara, and M. Vainio.Large scale data acquisition of simultaneous MRI and speech. Applied Acoustics 83(1):64–75, 2014.

A. Aalto, T. Lukkari, and J. Malinen.

Acoustic wave guides as infinite-dimensional dynamical systems. ESAIM: Control, Optimisation andCalculus of Variations, 2014.

T. Lukkari and J. Malinen.

A posteriori error estimates for Webster’s equation in wave propagation. Journal of Mathematical Analysisand Applications, 2015.

D.Sc. Jarmo Malinenspeech.math.aalto.fi/pdf/ClinPhon.pdf · 2018. 5. 31. · Glottal pulse...

Documents

Transcript of D.Sc. Jarmo Malinenspeech.math.aalto.fi/pdf/ClinPhon.pdf · 2018. 5. 31. · Glottal pulse...