Building a corpus of pathological speech

Post on 06-Feb-2016

34 views 0 download

Tags:

description

Building a corpus of pathological speech. Gwen Van Nuffelen Marc De Bodt. Catherine Middag Jean-Pierre Martens. Dutch Corpus of Pathological and Normal Speech. disturbed muscular control due to damage of the nervous system  weak, slow, imprecise, uncoordinated movements. - PowerPoint PPT Presentation

Transcript of Building a corpus of pathological speech

Building a corpus of pathological speech

Catherine Middag

Jean-Pierre Martens

Gwen Van Nuffelen

Marc De Bodt

Dutch Corpus of Pathological and Normal Speech

Speakers NNormal (N) 119

Dysarthria (D) 102

Hearing impairment (H) 47

Laryngectomy (L) 45

Cleft (C) 39

Articulation disorders (A) 16

Voice disorder (VD) 8

Glossectomy (G) 1

Total 377

disturbed muscular control due to damage of the nervous system weak, slow, imprecise, uncoordinated movements

Dutch Corpus of Pathological and Normal Speech

Speakers NNormal (N) 119

Dysarthria (D) 102

Hearing impairment (H) 47

Laryngectomy (L) 45

Cleft (C) 39

Articulation disorders (A) 16

Voice disorder (VD) 8

Glossectomy (G) 1

Total 377

TL: surgical removal of the larynx and separation of the trachea from the mouth, nose, and esophagusTE, E, electro larynx (servox)

PL: partial removal of laryngeal structures, vocal folds

Dutch Corpus of Pathological and Normal Speech

Speakers NNormal (N) 119

Dysarthria (D) 102

Hearing impairment (H) 47

Laryngectomy (L) 45

Cleft (C) 39

Articulation disorders (A) 16

Voice disorder (VD) 8

Glossectomy (G) 1

Total 377

Speakers

• native speakers of Dutch• adequate language, cognitive, visual and hearing* abilities

Recordings

• Natural, quiet environment ~ clinical setting• No sound treated box

• Mini-disc (Sony, MZ-R700)• Microphone

• Sony (mouth-microphone distance: 30 cm)• Shure head set

• Transferred to a notebook wave file (mono, 44kHz)• 16 kHz

Type of samples

Sample NDutch Intelligibility Assessment 357

Articulation assessment 21

Sentences 211

Text 172

Text Marloes 221

Spontaneous speech 39

Semi spontaneous speech 136

Sustained vowel 216

Diadochokinetic rate 214

Formant transition 212

Dutch Intelligibility Assessment (DIA)

Intelligibility at phoneme level

50 consonant – vowel – consonant words3 subtests:

• A: initial consonants (19 words)• B: final consonants (15 words)• C: medial vowels/ diphthongs (16 words)

Balanced mix of existing and non-existing (well pronounceable) words

Large pool of test items: 25 lists/ subtest 25*25*25 different tests

lijst A3

1. vop2. ziep3. fuis4. deek5. koen6. hom7. dar8. paam9. mil10. boos11. son12. geur13. nee14. taf15. oes16. loon17. ruk18. joef19. wout

lijst B22

1. geen2. diem3. zoem4. daai5. jog6. peef7. zaar8. paat9. tik10. vang11. boop12. lieuw13. roos14. toe15. riel

lijst C11

1. gul2. zuut3. det4. wok5. waan6. heun7. nout8. vees9. meul10. wiel11. sas12. tuik13. oet14. rood15. min16. deil

DIA

16 year-old girl, stroke, dysarthria, PI: 40%

79 year-old male, TL, TE-speech, PI: 68%

DIA

1 .op ø b d f g h j k l m n p r s t v w z

1. dop

2. nuis

3.

top

List A10

Intelligibility: percentage of phonemes correctly understood

DIA

-20 0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90histogram of the dia scores

score

num

ber o

f per

sons

Annotations DIA

• Praat• 2 tiers

• Tier 1: target word• Tier 2: fixed frame + perceived phoneme

• . VC• CV.• C.C

• Orthographic transcriptions

List A Target phoneme: initial consonantFixed frame: . V C

Articulation assessment

• Children • Insufficient reading skills• Logo-Art (Baarda et al, 2001)• Picture naming test• Annotations:

• Orthographic• Tier 1: target• Tier 2: perceived utterance (no fixed frame)

Sentences

Motor Speech Profile (Kay Elemetrics)

‘Wil je liever de thee of de borrel ?’‘Na nieuwjaar was hij weeral hier’

N= 211Orthographic transcriptionsTier 1 – tier 2; no word boundaries

man, no speech pathology

18 year-old male, congenital dysarthria

Text Marloes and Text

• Text ‘Papa en Marloes’• standardized text• balanced representation of Dutch phonemes• often used in clinical practice

• Text• different texts with the same reading level

• orthographic transcriptions• 2 tiers• boundaries between sentences

(Semi) Spontaneous speech

• Spontaneous• Semi spontaneous: randomly selected sequence of pictures• No annotations available

Future

• Gradually increase number samples• DIA validation SPACE intelligibility assessment• DIA sentence level: > 200 control speakers 3*6

sentences + annotations + pathological samples

Thank you!