An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko -...

82
An Unsupervised Probability Model for Speech-to-Translation Alignment of Low-Resource Languages Antonios Anastasopoulos 1 , David Chiang 1 , Long Duong 2 1 University of Notre Dame, USA 2 University of Melbourne, Australia

Transcript of An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko -...

Page 1: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

An Unsupervised Probability Model for Speech-to-Translation Alignment

of Low-Resource Languages

Antonios Anastasopoulos1, David Chiang1, Long Duong2

1 University of Notre Dame, USA 2 University of Melbourne, Australia

Page 2: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Why Speech-based MT?

90% of languages do not have a writing system

Motivation

2

Page 3: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Why Speech-based MT?

90% of languages do not have a writing system

Motivation

2

Page 4: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Why Speech-based MT?

90% of languages do not have a writing system

Motivation

2

Page 5: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Endangered languages documentation

Use speech with translations

Motivation

3

Using the Aikuma (Bird 2010) app to collect parallel speech

Page 6: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Low-resource languages

Utilize translations rather than transcriptions

Motivation

4

Page 7: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Task Description

5

Page 8: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Source side: Frames of the speech signal

Task Description

5

Page 9: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Source side: Frames of the speech signal

Target side: Translation text

Task Description

5

…a

little bit of

knowledge

Page 10: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Source side: Frames of the speech signal

Target side: Translation text

Task: find best alignment between source and target side

Task Description

5

…a

little bit of

knowledge

Page 11: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Source side: Frames of the speech signal

Target side: Translation text

Task: find best alignment between source and target side

Task Description

5

…a

little bit of

knowledge

Page 12: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Source side: Frames of the speech signal

Target side: Translation text

Task: find best alignment between source and target side

Task Description

5

…a

little bit of

knowledge

Page 13: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Source side: Frames of the speech signal

Target side: Translation text

Task: find best alignment between source and target side

Task Description

5

…a

little bit of

knowledge

Page 14: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Source side: Frames of the speech signal

Target side: Translation text

Task: find best alignment between source and target side

Task Description

5

…a

little bit of

knowledge

Page 15: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Source side: Frames of the speech signal

Target side: Translation text

Task: find best alignment between source and target side

Task Description

5

…a

little bit of

knowledge

Page 16: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Source side: Frames of the speech signal

Target side: Translation text

Task: find best alignment between source and target side

Task Description

5

…a

little bit of

knowledge

Our method outperforms both baselines

Page 17: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

The big picture

6

Page 18: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

The big picture

go dog go

6

vaya vaya perro

Page 19: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

The big picture

go dog go

6

vaya vaya perro

Page 20: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

The big picture

go dog go

6

vaya vaya perro

Page 21: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

The big picture

go dog go

6

vaya vaya perro

f8 : f8 :f13 :

Page 22: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

The big picture

go dog go

6

vaya vaya perro

f8 : f8 :f13 :

Page 23: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Distortion model

Controls the reordering of the target words - based on fast-align [Dyer et al.]

7

Page 24: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Distortion model

Controls the reordering of the target words - based on fast-align [Dyer et al.]

Original

7

Page 25: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Modification

Distortion model

Controls the reordering of the target words - based on fast-align [Dyer et al.]

7

Span Start Span End

Page 26: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Clustering model

Assuming a “prototype” for each cluster

8

Page 27: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Clustering model

Assuming a “prototype” for each cluster

8

Page 28: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Clustering model

Assuming a “prototype” for each cluster

f8

8

Page 29: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Clustering model

Assuming a “prototype” for each cluster

f8

8

Page 30: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Clustering model

Assuming a “prototype” for each cluster

f8

8

Page 31: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Clustering model

Assuming a “prototype” for each cluster

f8

8

Page 32: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Clustering model

Assuming a “prototype” for each cluster

f8

8

Page 33: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Clustering model

Assuming a “prototype” for each cluster

f8

8

Page 34: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Clustering model

Assuming a “prototype” for each cluster

f8

8

Page 35: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Training

Expectation-Maximization

9

Page 36: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Training

Expectation-Maximization

9

Page 37: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Initialize spans and clusters

Training

Expectation-Maximization

9

Page 38: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Initialize spans and clusters

Training

Expectation-Maximization

9

Page 39: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Initialize spans and clusters

Training

Expectation-Maximization

9

Page 40: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Initialize spans and clusters

Training

Expectation-Maximization

9

Page 41: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Initialize spans and clusters• M step:

- Re-estimate prototypes

Training

Expectation-Maximization

9

Page 42: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Initialize spans and clusters• M step:

- Re-estimate prototypes

Training

Expectation-Maximization

9

Page 43: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Initialize spans and clusters• M step:

- Re-estimate prototypes

Training

Expectation-Maximization

9

Page 44: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Initialize spans and clusters• M step:

- Re-estimate prototypes• E step:

Training

Expectation-Maximization

9

Page 45: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Initialize spans and clusters• M step:

- Re-estimate prototypes• E step:

- Assign cluster and align

Training

Expectation-Maximization

9

Page 46: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Initialize spans and clusters• M step:

- Re-estimate prototypes• E step:

- Assign cluster and align

Training

Expectation-Maximization

9

Page 47: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Initialize spans and clusters• M step:

- Re-estimate prototypes• E step:

- Assign cluster and align

Training

Expectation-Maximization

9

Page 48: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Initialize spans and clusters• M step:

- Re-estimate prototypes• E step:

- Assign cluster and align- We restrict the search space:

- voice activity detection - phone boundary detection [Khanaga et al.]

Training

Expectation-Maximization

9

Page 49: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Experiments

10

Language Pair Dataset Number of utterances

Griko - Italian [Lekakou et al] 330

Spanish - EnglishCALLHOME (sample) 2k

CALLHOME (all) 17kFisher 143k

Page 50: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Baselines

11

Page 51: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Baselines

11

• Naive:• frames/word ~ #characters• along the diagonal

Page 52: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Baselines

11

• Naive:• frames/word ~ #characters• along the diagonal

Austin is great

Page 53: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Baselines

11

• Naive:• frames/word ~ #characters• along the diagonal

• Neural [Duong et al]:• DNN optimised for direct translation of speech• convert attention mechanism weights to alignments

Austin is great

Page 54: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Alignment F-score

12

Results

Page 55: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Alignment F-scoreF-

scor

e

0

10

20

30

40

50

60

Griko-Italian Callhome (2k) Callhome (17k) Fisher (143k)

naive neural ours

12

Results

Page 56: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Alignment F-scoreF-

scor

e

0

10

20

30

40

50

60

Griko-Italian Callhome (2k) Callhome (17k) Fisher (143k)

naive neural ours

27.8

35.735.8

46.7

12

Results

Page 57: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Alignment F-scoreF-

scor

e

0

10

20

30

40

50

60

Griko-Italian Callhome (2k) Callhome (17k) Fisher (143k)

naive neural ours

26.229.126.427.0 27.8

35.735.8

46.7

12

Results

Page 58: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Alignment F-scoreF-

scor

e

0

10

20

30

40

50

60

Griko-Italian Callhome (2k) Callhome (17k) Fisher (143k)

naive neural ours

30.8

38.638.8

53.8

26.229.126.427.0 27.8

35.735.8

46.7

12

Results

Page 59: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Alignment Precision

13

Results

Page 60: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Alignment PrecisionPr

ecis

ion

0

10

20

30

40

50

60

Griko-Italian Callhome (2k) Callhome (17k) Fisher (143k)

naive neural ours

13

Results

Page 61: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Alignment PrecisionPr

ecis

ion

0

10

20

30

40

50

60

Griko-Italian Callhome (2k) Callhome (17k) Fisher (143k)

naive neural ours

24.0

31.831.9

42.2

13

Results

Page 62: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Alignment PrecisionPr

ecis

ion

0

10

20

30

40

50

60

Griko-Italian Callhome (2k) Callhome (17k) Fisher (143k)

naive neural ours

24.726.123.824.6 24.0

31.831.9

42.2

13

Results

Page 63: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Alignment PrecisionPr

ecis

ion

0

10

20

30

40

50

60

Griko-Italian Callhome (2k) Callhome (17k) Fisher (143k)

naive neural ours

33.338.438.8

56.6

24.726.123.824.6 24.0

31.831.9

42.2

13

Results

Page 64: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

ResultsWord-level F-score

14

Word Length (frames)

aver

age

F-sc

ore

Page 65: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Example

15

Page 66: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Conclusion

Alignment model

Extension of IBM-2 with fast-align for speech-to-translation

k-means clustering with DTW and DBA

Improvements in F-score and particularly Precision

https://bitbucket.org/ndnlp/speech2translation

16

Page 67: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Alignment Recall

17

Results

Page 68: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Alignment RecallRe

call

0

10

20

30

40

50

Griko-Italian Callhome (2k) Callhome (17k) Fisher (143k)

naive neural ours

17

Results

Page 69: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Alignment RecallRe

call

0

10

20

30

40

50

Griko-Italian Callhome (2k) Callhome (17k) Fisher (143k)

naive neural ours

33.2

40.740.8

52.2

17

Results

Page 70: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Alignment RecallRe

call

0

10

20

30

40

50

Griko-Italian Callhome (2k) Callhome (17k) Fisher (143k)

naive neural ours

27.832.9

29.830.033.2

40.740.8

52.2

17

Results

Page 71: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Alignment RecallRe

call

0

10

20

30

40

50

Griko-Italian Callhome (2k) Callhome (17k) Fisher (143k)

naive neural ours

28.7

38.838.9

51.2

27.832.9

29.830.033.2

40.740.8

52.2

17

Results

Page 72: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Example

18

Page 73: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Example

19

Page 74: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Proper model

20

Deficient:

Page 75: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Proper model

20

Proper:

Page 76: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Proper model

20

Proper:

The proper model performs much worse. -It favours too long or too short spans

Page 77: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Example

21

Page 78: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

Background: DTW and DBA

Dynamic Time Warping (DTW)

22

Page 79: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

DTW Barycenter Averaging (DBA)

Background: DTW and DBA

22

Page 80: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

M-step:

Prototype estimation with DTW Barycenter Averaging

23

Page 81: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

M-step:

Prototype estimation with DTW Barycenter Averaging

23

Page 82: An Unsupervised Probability Model for Speech-to ...aanastas/research/EMNLP2016slides.pdfGriko - Italian [Lekakou et al] 330 Spanish - English CALLHOME (sample) 2k CALLHOME (all) 17k

M-step:

Prototype estimation with DTW Barycenter Averaging

23