Building a corpus of pathological speech

19
Building a corpus of pathological speech Catherine Middag Jean-Pierre Martens Gwen Van Nuffelen Marc De Bodt

description

Building a corpus of pathological speech. Gwen Van Nuffelen Marc De Bodt. Catherine Middag Jean-Pierre Martens. Dutch Corpus of Pathological and Normal Speech. disturbed muscular control due to damage of the nervous system  weak, slow, imprecise, uncoordinated movements. - PowerPoint PPT Presentation

Transcript of Building a corpus of pathological speech

Page 1: Building a corpus of pathological speech

Building a corpus of pathological speech

Catherine Middag

Jean-Pierre Martens

Gwen Van Nuffelen

Marc De Bodt

Page 2: Building a corpus of pathological speech

Dutch Corpus of Pathological and Normal Speech

Speakers NNormal (N) 119

Dysarthria (D) 102

Hearing impairment (H) 47

Laryngectomy (L) 45

Cleft (C) 39

Articulation disorders (A) 16

Voice disorder (VD) 8

Glossectomy (G) 1

Total 377

disturbed muscular control due to damage of the nervous system weak, slow, imprecise, uncoordinated movements

Page 3: Building a corpus of pathological speech

Dutch Corpus of Pathological and Normal Speech

Speakers NNormal (N) 119

Dysarthria (D) 102

Hearing impairment (H) 47

Laryngectomy (L) 45

Cleft (C) 39

Articulation disorders (A) 16

Voice disorder (VD) 8

Glossectomy (G) 1

Total 377

TL: surgical removal of the larynx and separation of the trachea from the mouth, nose, and esophagusTE, E, electro larynx (servox)

PL: partial removal of laryngeal structures, vocal folds

Page 4: Building a corpus of pathological speech

Dutch Corpus of Pathological and Normal Speech

Speakers NNormal (N) 119

Dysarthria (D) 102

Hearing impairment (H) 47

Laryngectomy (L) 45

Cleft (C) 39

Articulation disorders (A) 16

Voice disorder (VD) 8

Glossectomy (G) 1

Total 377

Page 5: Building a corpus of pathological speech

Speakers

• native speakers of Dutch• adequate language, cognitive, visual and hearing* abilities

Page 6: Building a corpus of pathological speech

Recordings

• Natural, quiet environment ~ clinical setting• No sound treated box

• Mini-disc (Sony, MZ-R700)• Microphone

• Sony (mouth-microphone distance: 30 cm)• Shure head set

• Transferred to a notebook wave file (mono, 44kHz)• 16 kHz

Page 7: Building a corpus of pathological speech

Type of samples

Sample NDutch Intelligibility Assessment 357

Articulation assessment 21

Sentences 211

Text 172

Text Marloes 221

Spontaneous speech 39

Semi spontaneous speech 136

Sustained vowel 216

Diadochokinetic rate 214

Formant transition 212

Page 8: Building a corpus of pathological speech

Dutch Intelligibility Assessment (DIA)

Intelligibility at phoneme level

50 consonant – vowel – consonant words3 subtests:

• A: initial consonants (19 words)• B: final consonants (15 words)• C: medial vowels/ diphthongs (16 words)

Balanced mix of existing and non-existing (well pronounceable) words

Large pool of test items: 25 lists/ subtest 25*25*25 different tests

Page 9: Building a corpus of pathological speech

lijst A3

1. vop2. ziep3. fuis4. deek5. koen6. hom7. dar8. paam9. mil10. boos11. son12. geur13. nee14. taf15. oes16. loon17. ruk18. joef19. wout

lijst B22

1. geen2. diem3. zoem4. daai5. jog6. peef7. zaar8. paat9. tik10. vang11. boop12. lieuw13. roos14. toe15. riel

lijst C11

1. gul2. zuut3. det4. wok5. waan6. heun7. nout8. vees9. meul10. wiel11. sas12. tuik13. oet14. rood15. min16. deil

DIA

16 year-old girl, stroke, dysarthria, PI: 40%

79 year-old male, TL, TE-speech, PI: 68%

Page 10: Building a corpus of pathological speech

DIA

1 .op ø b d f g h j k l m n p r s t v w z

1. dop

2. nuis

3.

top

List A10

Intelligibility: percentage of phonemes correctly understood

Page 11: Building a corpus of pathological speech

DIA

-20 0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90histogram of the dia scores

score

num

ber o

f per

sons

Page 12: Building a corpus of pathological speech

Annotations DIA

• Praat• 2 tiers

• Tier 1: target word• Tier 2: fixed frame + perceived phoneme

• . VC• CV.• C.C

• Orthographic transcriptions

Page 13: Building a corpus of pathological speech

List A Target phoneme: initial consonantFixed frame: . V C

Page 14: Building a corpus of pathological speech

Articulation assessment

• Children • Insufficient reading skills• Logo-Art (Baarda et al, 2001)• Picture naming test• Annotations:

• Orthographic• Tier 1: target• Tier 2: perceived utterance (no fixed frame)

Page 15: Building a corpus of pathological speech

Sentences

Motor Speech Profile (Kay Elemetrics)

‘Wil je liever de thee of de borrel ?’‘Na nieuwjaar was hij weeral hier’

N= 211Orthographic transcriptionsTier 1 – tier 2; no word boundaries

man, no speech pathology

18 year-old male, congenital dysarthria

Page 16: Building a corpus of pathological speech

Text Marloes and Text

• Text ‘Papa en Marloes’• standardized text• balanced representation of Dutch phonemes• often used in clinical practice

• Text• different texts with the same reading level

• orthographic transcriptions• 2 tiers• boundaries between sentences

Page 17: Building a corpus of pathological speech

(Semi) Spontaneous speech

• Spontaneous• Semi spontaneous: randomly selected sequence of pictures• No annotations available

Page 18: Building a corpus of pathological speech

Future

• Gradually increase number samples• DIA validation SPACE intelligibility assessment• DIA sentence level: > 200 control speakers 3*6

sentences + annotations + pathological samples

Page 19: Building a corpus of pathological speech

Thank you!