Analysing voice quality - Trinity College Dublin€¦ · Analysing voice quality John Kane April...

Post on 20-Aug-2020

3 views 0 download

Transcript of Analysing voice quality - Trinity College Dublin€¦ · Analysing voice quality John Kane April...

uni

Analysing voice quality

John Kane

April 30, 2010

John Kane () Analysing voice quality April 30, 2010 1 / 18

uni

Voice quality (VQ)

Mainly a consequence of the vibration of the vocal folds.

Overall timbre of a person’s voice (organic setting and dynamicshifts).

VQ not limited to pitch and loudness.

John Kane () Analysing voice quality April 30, 2010 2 / 18

uni

Voice quality (VQ)

Mainly a consequence of the vibration of the vocal folds.

Overall timbre of a person’s voice (organic setting and dynamicshifts).

VQ not limited to pitch and loudness.

John Kane () Analysing voice quality April 30, 2010 2 / 18

uni

Voice quality (VQ)

Mainly a consequence of the vibration of the vocal folds.

Overall timbre of a person’s voice (organic setting and dynamicshifts).

VQ not limited to pitch and loudness.

John Kane () Analysing voice quality April 30, 2010 2 / 18

uni

Voice quality

Laver sought to provide quantitative physiological/acousticdescriptions of VQ (1980).

VQs: breathy, whispery, creaky, harsh, falsetto, modal.

In real speech these VQs exist on continuous scales and incombination with others.

Voice quality examples

John Kane () Analysing voice quality April 30, 2010 3 / 18

uni

Voice quality

Laver sought to provide quantitative physiological/acousticdescriptions of VQ (1980).

VQs: breathy, whispery, creaky, harsh, falsetto, modal.

In real speech these VQs exist on continuous scales and incombination with others.

Voice quality examples

John Kane () Analysing voice quality April 30, 2010 3 / 18

uni

Voice quality

Laver sought to provide quantitative physiological/acousticdescriptions of VQ (1980).

VQs: breathy, whispery, creaky, harsh, falsetto, modal.

In real speech these VQs exist on continuous scales and incombination with others.

Voice quality examples

John Kane () Analysing voice quality April 30, 2010 3 / 18

uni

Voice quality

Laver sought to provide quantitative physiological/acousticdescriptions of VQ (1980).

VQs: breathy, whispery, creaky, harsh, falsetto, modal.

In real speech these VQs exist on continuous scales and incombination with others.

Voice quality examples

John Kane () Analysing voice quality April 30, 2010 3 / 18

uni

Voice quality (VQ)

Reveals information on speaker’s state and attitude.

Infants already sensitive to different VQs.

Mackenzie Beck (2005) VQ used before understanding of linguisticcontent.

John Kane () Analysing voice quality April 30, 2010 4 / 18

uni

Voice quality (VQ)

Reveals information on speaker’s state and attitude.

Infants already sensitive to different VQs.

Mackenzie Beck (2005) VQ used before understanding of linguisticcontent.

John Kane () Analysing voice quality April 30, 2010 4 / 18

uni

Voice quality in speech communication

Contrastive linguistic purpose in some languages.

Gujurati “Twelve” vs “outside” (Breathy)Danish “hun” vs “hund” (Creaky)

Status, popular trends.

Extralinguistic, active listening (grunts etc.).

Prosodic component in neutral running speech.

John Kane () Analysing voice quality April 30, 2010 5 / 18

uni

Voice quality in speech communication

Contrastive linguistic purpose in some languages.

Gujurati “Twelve” vs “outside” (Breathy)Danish “hun” vs “hund” (Creaky)

Status, popular trends.

Extralinguistic, active listening (grunts etc.).

Prosodic component in neutral running speech.

John Kane () Analysing voice quality April 30, 2010 5 / 18

uni

Voice quality in speech communication

Contrastive linguistic purpose in some languages.

Gujurati “Twelve” vs “outside” (Breathy)Danish “hun” vs “hund” (Creaky)

Status, popular trends.

Extralinguistic, active listening (grunts etc.).

Prosodic component in neutral running speech.

John Kane () Analysing voice quality April 30, 2010 5 / 18

uni

Voice quality in speech communication

Contrastive linguistic purpose in some languages.

Gujurati “Twelve” vs “outside” (Breathy)Danish “hun” vs “hund” (Creaky)

Status, popular trends.

Extralinguistic, active listening (grunts etc.).

Prosodic component in neutral running speech.

John Kane () Analysing voice quality April 30, 2010 5 / 18

uni

Potentials of VQ/glottal source in speech technology

Improvement of naturalness in parameter speech synthesis (Cabral2008, Raitio 2008).

Potential for more flexible/expressive speech synthesis

Ability to aid emotion detection and paralinguistic annotation.

John Kane () Analysing voice quality April 30, 2010 6 / 18

uni

Potentials of VQ/glottal source in speech technology

Improvement of naturalness in parameter speech synthesis (Cabral2008, Raitio 2008).

Potential for more flexible/expressive speech synthesis

Ability to aid emotion detection and paralinguistic annotation.

John Kane () Analysing voice quality April 30, 2010 6 / 18

uni

Potentials of VQ/glottal source in speech technology

Improvement of naturalness in parameter speech synthesis (Cabral2008, Raitio 2008).

Potential for more flexible/expressive speech synthesis

Ability to aid emotion detection and paralinguistic annotation.

John Kane () Analysing voice quality April 30, 2010 6 / 18

uni

Difficulties measuring VQ

As listeners were are very sensitive to variation in VQ.

Difficult job for computers.

Hidden position of vocals folds.

Vocal Folds

Robust extraction of glottal source difficult job for signal processing.

John Kane () Analysing voice quality April 30, 2010 7 / 18

uni

Difficulties measuring VQ

As listeners were are very sensitive to variation in VQ.

Difficult job for computers.

Hidden position of vocals folds.

Vocal Folds

Robust extraction of glottal source difficult job for signal processing.

John Kane () Analysing voice quality April 30, 2010 7 / 18

uni

Difficulties measuring VQ

As listeners were are very sensitive to variation in VQ.

Difficult job for computers.

Hidden position of vocals folds.

Vocal Folds

Robust extraction of glottal source difficult job for signal processing.

John Kane () Analysing voice quality April 30, 2010 7 / 18

uni

Difficulties measuring VQ

As listeners were are very sensitive to variation in VQ.

Difficult job for computers.

Hidden position of vocals folds.

Vocal Folds

Robust extraction of glottal source difficult job for signal processing.

John Kane () Analysing voice quality April 30, 2010 7 / 18

uni

Electroglottography (EGG)

John Kane () Analysing voice quality April 30, 2010 8 / 18

uni

Inverse filtering

John Kane () Analysing voice quality April 30, 2010 9 / 18

uni

Parameterisation

Time based measurements (LF model)

0 20 40 60 80 100 120-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

Time (ms)

Am

plitu

de

Adv: Related to physiologyDisAdv: Sensitive to noise and phase.

Frequency domain measurements

Adv: Avoids phase issuesDisAdv: Existing parameters strongly correlated.

John Kane () Analysing voice quality April 30, 2010 10 / 18

uni

Parameterisation

Time based measurements (LF model)

0 20 40 60 80 100 120-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

Time (ms)

Am

plitu

de

Adv: Related to physiologyDisAdv: Sensitive to noise and phase.

Frequency domain measurements

Adv: Avoids phase issuesDisAdv: Existing parameters strongly correlated.

John Kane () Analysing voice quality April 30, 2010 10 / 18

uni

Our parameterisation system

RECORDEDSPEECH

CODEBOOKSEARCH FOR

INITIAL VALUES

TWO PARTOPTIMISATION

MODELPARAMETERS

AUTOMATICINVERSE

FILTERING(ALKU 2002)

0 500 1000 1500 2000 2500 3000−0.2

0

0.2

Time (ms)

Speech waveform

0 500 1000 1500 2000 2500 3000−1

−0.5

0

0.5

Time (ms)

Voice source waveform

0 500 1000 1500 2000 2500 3000 3500−90

−80

−70

−60

−50

−40

−30

−20

−10

Frequency (Hz)

Am

plitu

de (

dB)

Spectral optimisation

Voice source spectrum

Fitted model

Rk RgRk

EE RaF0

John Kane () Analysing voice quality April 30, 2010 11 / 18

uni

Our Interspeech/Speech Communication submission

Description of our frequency domain parameterisation approach.

Finnish vowels: /A e i o u y æ ø/, 11 speakers

BREATHY

MODAL

PRESSED

John Kane () Analysing voice quality April 30, 2010 12 / 18

uni

Our Interspeech/Speech Communication submission

Description of our frequency domain parameterisation approach.

Finnish vowels: /A e i o u y æ ø/, 11 speakers

BREATHY

MODAL

PRESSED

John Kane () Analysing voice quality April 30, 2010 12 / 18

uni

Our Interspeech/Speech Communication submission

Description of our frequency domain parameterisation approach.

Finnish vowels: /A e i o u y æ ø/, 11 speakers

BREATHY

MODAL

PRESSED

John Kane () Analysing voice quality April 30, 2010 12 / 18

uni

Evaluation

Robustness against simulations of difficult conditions.

Relative change Sensitivity of parametersCoefficient of variation Pulse-to-pulse variation

CLEANSIGNAL

SIGNAL WITHADDITIVE NOISE

(SNR = 45 dB)

SIGNAL WITHADDITIVE NOISE

(SNR = 30 dB)

SIGNAL WITHRECORDING

SYSTEMDISTORTION

Ability to discriminate voice qualities.

Explained variance Regression analysis.Classification Linear discriminant analysis.

John Kane () Analysing voice quality April 30, 2010 13 / 18

uni

Evaluation

Robustness against simulations of difficult conditions.

Relative change Sensitivity of parametersCoefficient of variation Pulse-to-pulse variation

CLEANSIGNAL

SIGNAL WITHADDITIVE NOISE

(SNR = 45 dB)

SIGNAL WITHADDITIVE NOISE

(SNR = 30 dB)

SIGNAL WITHRECORDING

SYSTEMDISTORTION

Ability to discriminate voice qualities.

Explained variance Regression analysis.Classification Linear discriminant analysis.

John Kane () Analysing voice quality April 30, 2010 13 / 18

uni

Overall results

Clearly better robustness against distortions imposed by recordingsystem.

Breathy Modal Pressed0

5

10

15

20

25

30

35

40Ra

Rel

ativ

e C

hang

e (%

)

Voice qualitiesBreathy Modal Pressed

0

5

10

15

20

25

Rel

ativ

e C

hang

e (%

)

Voice qualities

Rk

Breathy Modal Pressed0

1

2

3

4

5

6

7

8

9

Rel

ativ

e C

hang

e (%

)

Voice qualities

Rg

John Kane () Analysing voice quality April 30, 2010 14 / 18

uni

Overall results

Generally less senstive to moderate levels of additive noise imposed onsignals.

High noise levels at times affected robustness.

John Kane () Analysing voice quality April 30, 2010 15 / 18

uni

Overall results

Generally less senstive to moderate levels of additive noise imposed onsignals.

High noise levels at times affected robustness.

John Kane () Analysing voice quality April 30, 2010 15 / 18

uni

Overall results

Clearly higher R2 scores for individual parameters.

Rg Rk Ra0

5

10

15

20

25

ParametersR

−sq

uare

d va

lues

(%

)

New system

Time system

April 30, 2010

Abstract

1

Table 1: Confusion matrix of classification scores (%) of the three voice qualitiesusing the two systems.

Spec TimeBre Neu Pre Bre Neu Pre

Bre 79 20 1 76 22 2Neu 32 47 21 42 43 15Pre 6 24 70 8 28 64

1

Higher classification scores.

John Kane () Analysing voice quality April 30, 2010 16 / 18

uni

Some thoughts

New method may overcome some of the issues which have hamperedautomated glottal source analysis.

Produced vowels vs running speech.

Criteria to be defined to maximise the probablility of robustparameter extraction.

Extension of islands of reliability (Mokhtari & Campbell 2002)

John Kane () Analysing voice quality April 30, 2010 17 / 18

uni

Some thoughts

New method may overcome some of the issues which have hamperedautomated glottal source analysis.

Produced vowels vs running speech.

Criteria to be defined to maximise the probablility of robustparameter extraction.

Extension of islands of reliability (Mokhtari & Campbell 2002)

John Kane () Analysing voice quality April 30, 2010 17 / 18

uni

Some thoughts

New method may overcome some of the issues which have hamperedautomated glottal source analysis.

Produced vowels vs running speech.

Criteria to be defined to maximise the probablility of robustparameter extraction.

Extension of islands of reliability (Mokhtari & Campbell 2002)

John Kane () Analysing voice quality April 30, 2010 17 / 18

uni

Some thoughts

New method may overcome some of the issues which have hamperedautomated glottal source analysis.

Produced vowels vs running speech.

Criteria to be defined to maximise the probablility of robustparameter extraction.

Extension of islands of reliability (Mokhtari & Campbell 2002)

John Kane () Analysing voice quality April 30, 2010 17 / 18

uni

Future work

Applying new method to analysis of glottal source dynamics with Dr.Yanushevskaya.

Further work with HMM based classification of voice qualities withMark Kane.

Possible collaboration with Catharine Oertel and Prof. Campbell inanalysis of voice quality from naturalistic speech recordings.

Open to other collaborations!

John Kane () Analysing voice quality April 30, 2010 18 / 18

uni

Future work

Applying new method to analysis of glottal source dynamics with Dr.Yanushevskaya.

Further work with HMM based classification of voice qualities withMark Kane.

Possible collaboration with Catharine Oertel and Prof. Campbell inanalysis of voice quality from naturalistic speech recordings.

Open to other collaborations!

John Kane () Analysing voice quality April 30, 2010 18 / 18

uni

Future work

Applying new method to analysis of glottal source dynamics with Dr.Yanushevskaya.

Further work with HMM based classification of voice qualities withMark Kane.

Possible collaboration with Catharine Oertel and Prof. Campbell inanalysis of voice quality from naturalistic speech recordings.

Open to other collaborations!

John Kane () Analysing voice quality April 30, 2010 18 / 18

uni

Future work

Applying new method to analysis of glottal source dynamics with Dr.Yanushevskaya.

Further work with HMM based classification of voice qualities withMark Kane.

Possible collaboration with Catharine Oertel and Prof. Campbell inanalysis of voice quality from naturalistic speech recordings.

Open to other collaborations!

John Kane () Analysing voice quality April 30, 2010 18 / 18