New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

72
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Maria Eskevich 1 , Walid Magdy 2, 3 , Gareth J.F. Jones 1, 2 1 Centre for Digital Video Processing 2 Centre for Next Generation Localisation School of Computing Dublin City University, Dublin, Ireland 3 Qatar Computing Research Institute - Qatar Foundation Doha, Qatar April, 3, 2012

description

We introduce two new metrics for the evaluation of search effectiveness for informally structured speech data: mean average segment precision (MASP) which measures retrieval performance in terms of both content segmentation and ranking with respect to relevance; and mean average segment distance-weighted precision (MASDWP) which takes into account the distance between the start of the relevant segment and the retrieved segment. We demonstrate the effectiveness of these new metrics on a retrieval test collection based on the AMI meeting corpus.

Transcript of New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Page 1: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

New Metrics for Meaningful Evaluation ofInformally Structured Speech Retrieval

Maria Eskevich1, Walid Magdy2,3, Gareth J.F. Jones1,2

1 Centre for Digital Video Processing2Centre for Next Generation Localisation

School of ComputingDublin City University, Dublin, Ireland

3 Qatar Computing Research Institute - Qatar FoundationDoha, Qatar

April, 3, 2012

Page 2: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Outline

Speech Retrieval

Speech Search EvaluationMean Average Precision (MAP)Mean Average interpolated Precision (MAiP)mean Generalized Average Precision (mGAP)

New MetricsMean Average Segment Precision (MASP)Mean Average Segment Distance-Weighted Precision(MASDWP)

Retrieval Collection

Experimental Results

Conclusions

Page 3: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Speech Documents Diversity

I Broadcast news:

I Meetings:

Page 4: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Speech Documents Diversity

I Broadcast news:

I Meetings:

Page 5: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Speech Documents Diversity

I Broadcast news:

I Meetings:

Page 6: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Speech Documents Diversity

I Broadcast news:

I Meetings:

Page 7: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Speech Documents Diversity

I Broadcast news:

I Meetings:

Page 8: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Speech Retrieval

SpeechCollection

Queries(audio)

Queries(text)

AutomaticSpeechRecognitionSystem

Transcript

AutomaticSpeechRecognitionSystem

Segments

Segmentation

IndexedSegmentsIndexing

InformationRequest

Retrieval Results:textual segments

Retrieval

Retrieval Results:speech segments

Page 9: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Speech Retrieval

SpeechCollection

Queries(audio)

Queries(text)

AutomaticSpeechRecognitionSystem

Transcript

AutomaticSpeechRecognitionSystem

Segments

Segmentation

IndexedSegmentsIndexing

InformationRequest

Retrieval Results:textual segments

Retrieval

Retrieval Results:speech segments

Page 10: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Speech Retrieval

SpeechCollection

Queries(audio)

Queries(text)

AutomaticSpeechRecognitionSystem

Transcript

AutomaticSpeechRecognitionSystem

Segments

Segmentation

IndexedSegmentsIndexing

InformationRequest

Retrieval Results:textual segments

Retrieval

Retrieval Results:speech segments

Page 11: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Speech Retrieval

SpeechCollection

Queries(audio)

Queries(text)

AutomaticSpeechRecognitionSystem

Transcript

AutomaticSpeechRecognitionSystem

Segments

Segmentation

IndexedSegmentsIndexing

InformationRequest

Retrieval Results:textual segments

Retrieval

Retrieval Results:speech segments

Page 12: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Speech Retrieval

SpeechCollection

Queries(audio)

Queries(text)

AutomaticSpeechRecognitionSystem

Transcript

AutomaticSpeechRecognitionSystem

Segments

Segmentation

IndexedSegmentsIndexing

InformationRequest

Retrieval Results:textual segments

Retrieval

Retrieval Results:speech segments

Page 13: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Speech Retrieval

SpeechCollection

Queries(audio)

Queries(text)

AutomaticSpeechRecognitionSystem

Transcript

AutomaticSpeechRecognitionSystem

Segments

Segmentation

IndexedSegmentsIndexing

InformationRequest

Retrieval Results:textual segments

Retrieval

Retrieval Results:speech segments

Page 14: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Speech Retrieval

SpeechCollection

Queries(audio)

Queries(text)

AutomaticSpeechRecognitionSystem

Transcript

AutomaticSpeechRecognitionSystem

Segments

Segmentation

IndexedSegmentsIndexing

InformationRequest

Retrieval Results:textual segments

Retrieval

Retrieval Results:speech segments

Page 15: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Speech Retrieval

SpeechCollection

Queries(audio)

Queries(text)

AutomaticSpeechRecognitionSystem

Transcript

AutomaticSpeechRecognitionSystem

Segments

Segmentation

IndexedSegmentsIndexing

InformationRequest

Retrieval Results:textual segments

Retrieval

Retrieval Results:speech segments

Page 16: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Speech Retrieval

SpeechCollection

Queries(audio)

Queries(text)

AutomaticSpeechRecognitionSystem

Transcript

AutomaticSpeechRecognitionSystem

Segments

Segmentation

IndexedSegmentsIndexing

InformationRequest

Retrieval Results:textual segments

Retrieval

Retrieval Results:speech segments

Page 17: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Speech Retrieval

SpeechCollection

Queries(audio)

Queries(text)

AutomaticSpeechRecognitionSystem

Transcript

AutomaticSpeechRecognitionSystem

Segments

Segmentation

IndexedSegmentsIndexing

InformationRequest

Retrieval Results:textual segments

Retrieval

Retrieval Results:speech segments

Page 18: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Outline

Speech Retrieval

Speech Search EvaluationMean Average Precision (MAP)Mean Average interpolated Precision (MAiP)mean Generalized Average Precision (mGAP)

New MetricsMean Average Segment Precision (MASP)Mean Average Segment Distance-Weighted Precision(MASDWP)

Retrieval Collection

Experimental Results

Conclusions

Page 19: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Related Work in Speech Search Evaluation

Retrieval Units:

I Clearly defined documents:

TREC SDR: Mean Average Precision (MAP)

I Passages:I INEX : Mean Average interpolated Precision (MAiP)

I Jump-in points:I CLEF CL-SR: Mean Generalized Average Precision

(mGAP)

Page 20: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Related Work in Speech Search Evaluation

Retrieval Units:I Clearly defined documents:

TREC SDR: Mean Average Precision (MAP)

I Passages:I INEX : Mean Average interpolated Precision (MAiP)

I Jump-in points:I CLEF CL-SR: Mean Generalized Average Precision

(mGAP)

Page 21: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Related Work in Speech Search Evaluation

Retrieval Units:I Clearly defined documents:

TREC SDR: Mean Average Precision (MAP)

I Passages:I INEX : Mean Average interpolated Precision (MAiP)

I Jump-in points:I CLEF CL-SR: Mean Generalized Average Precision

(mGAP)

Page 22: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Related Work in Speech Search Evaluation

Retrieval Units:I Clearly defined documents:

TREC SDR: Mean Average Precision (MAP)

I Passages:I INEX : Mean Average interpolated Precision (MAiP)

I Jump-in points:I CLEF CL-SR: Mean Generalized Average Precision

(mGAP)

Page 23: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Related Work in Speech Search Evaluation

Retrieval Units:I Clearly defined documents:

TREC SDR: Mean Average Precision (MAP)

I Passages:I INEX : Mean Average interpolated Precision (MAiP)

I Jump-in points:I CLEF CL-SR: Mean Generalized Average Precision

(mGAP)

Page 24: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Mean Average interpolated Precision (MAiP)

Task: passage text retrieval.

Document relevance is not counted in a binary way.

Precision at rank r : fraction of retrieved number of charactersthat are relevant:

Average interpolated Precision (AiP): average of interpolatedprecision scores calculated at 101 recall levels (0.00, 0.01, . . . ,1.00):

AiP =1

101.

∑x=0.00,0.01,...,1.00

iP[x ]

Shortcomings: averaging over characters in transcript isnot suitable for speech tasks

Page 25: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Mean Average interpolated Precision (MAiP)

Task: passage text retrieval.

Document relevance is not counted in a binary way.

Precision at rank r : fraction of retrieved number of charactersthat are relevant:

Average interpolated Precision (AiP): average of interpolatedprecision scores calculated at 101 recall levels (0.00, 0.01, . . . ,1.00):

AiP =1

101.

∑x=0.00,0.01,...,1.00

iP[x ]

Shortcomings: averaging over characters in transcript isnot suitable for speech tasks

Page 26: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Mean Average interpolated Precision (MAiP)

Task: passage text retrieval.

Document relevance is not counted in a binary way.

Precision at rank r : fraction of retrieved number of charactersthat are relevant:

Average interpolated Precision (AiP): average of interpolatedprecision scores calculated at 101 recall levels (0.00, 0.01, . . . ,1.00):

AiP =1

101.

∑x=0.00,0.01,...,1.00

iP[x ]

Shortcomings: averaging over characters in transcript isnot suitable for speech tasks

Page 27: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

mean Generalized Average Precision (mGAP)

Task: retrieval of the jump-in points in time for relevant content

GAP =1n.

N∑r=1

P[r ] ·(

1 − DistanceGranularity

· 0.1)

Page 28: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

mean Generalized Average Precision (mGAP)

Task: retrieval of the jump-in points in time for relevant content

GAP =1n.

N∑r=1

P[r ] ·(

1 − DistanceGranularity

· 0.1)

Page 29: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

mean Generalized Average Precision (mGAP)

Task: retrieval of the jump-in points in time for relevant content

GAP =1n.

N∑r=1

P[r ] ·(

1 − DistanceGranularity

· 0.1)

Page 30: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

mean Generalized Average Precision (mGAP)

Task: retrieval of the jump-in points in time for relevant content

GAP =1n.

N∑r=1

P[r ] ·(

1 − DistanceGranularity

· 0.1)

Shortcomings: Does not take into accounthow much time the user needs to spend listeningto access the relevant content

Page 31: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

mean Generalized Average Precision (mGAP)

Task: retrieval of the jump-in points in time for relevant content

GAP =1n.

N∑r=1

P[r ] ·(

1 − DistanceGranularity

· 0.1)

Shortcomings: Does not take into accounthow much time the user needs to spend listeningto access the relevant content

Page 32: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Outline

Speech Retrieval

Speech Search EvaluationMean Average Precision (MAP)Mean Average interpolated Precision (MAiP)mean Generalized Average Precision (mGAP)

New MetricsMean Average Segment Precision (MASP)Mean Average Segment Distance-Weighted Precision(MASDWP)

Retrieval Collection

Experimental Results

Conclusions

Page 33: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Time Precision Oriented Metrics

Motivation:

I Create a metric that measures both the ranking quality andthe segmentation quality with respect to relevance in asingle score.

I Reflect how far the user has to listen into the segment at acertain rank until the relevant part actually begins.

Page 34: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Mean Average Segment Precision (MASP)

Segment Precision (SP[r ]) at rank r :

Average Segment Precision:

ASP =1n.

N∑r=1

SP[r ] · rel(sr )

rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0Difference from other metrics:

I the amount of relevant content is measured over timeinstead of text

I average segment precision (ASP) is calculated at theranks of segments containing relevant contentrather than fixed recall points as in MAiP

Page 35: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Mean Average Segment Precision (MASP)Segment Precision (SP[r ]) at rank r :

Average Segment Precision:

ASP =1n.

N∑r=1

SP[r ] · rel(sr )

rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0Difference from other metrics:

I the amount of relevant content is measured over timeinstead of text

I average segment precision (ASP) is calculated at theranks of segments containing relevant contentrather than fixed recall points as in MAiP

Page 36: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Mean Average Segment Precision (MASP)Segment Precision (SP[r ]) at rank r :

Average Segment Precision:

ASP =1n.

N∑r=1

SP[r ] · rel(sr )

rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0Difference from other metrics:

I the amount of relevant content is measured over timeinstead of text

I average segment precision (ASP) is calculated at theranks of segments containing relevant contentrather than fixed recall points as in MAiP

Page 37: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Mean Average Segment Precision (MASP)Segment Precision (SP[r ]) at rank r :

Average Segment Precision:

ASP =1n.

N∑r=1

SP[r ] · rel(sr )

rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0Difference from other metrics:

I the amount of relevant content is measured over timeinstead of text

I average segment precision (ASP) is calculated at theranks of segments containing relevant contentrather than fixed recall points as in MAiP

Page 38: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Mean Average Segment Precision (MASP)Segment Precision (SP[r ]) at rank r :

Average Segment Precision:

ASP =1n.

N∑r=1

SP[r ] · rel(sr )

rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0

Difference from other metrics:I the amount of relevant content is measured over time

instead of textI average segment precision (ASP) is calculated at the

ranks of segments containing relevant contentrather than fixed recall points as in MAiP

Page 39: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Mean Average Segment Precision (MASP)Segment Precision (SP[r ]) at rank r :

Average Segment Precision:

ASP =1n.

N∑r=1

SP[r ] · rel(sr )

rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0Difference from other metrics:

I the amount of relevant content is measured over timeinstead of text

I average segment precision (ASP) is calculated at theranks of segments containing relevant contentrather than fixed recall points as in MAiP

Page 40: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Mean Average Segment Distance-Weighted Precision(MASDWP)

Penalize ASP results as mGAP

ASDWP =1n.

N∑r=1

SP[r ] · rel(sr ) ·(

1 − DistanceGranularity

· 0.1)

Page 41: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Comparative example of AP, ASP and ASDWP

RetrievedSegments

1

2

3

4

5

6

Rel Len/Total Len

2/3

0/5

3/4

6/6

0/2

5/10

AP

1

1/2

2/3

3/4

3/5

4/6

1

1/2

2/3

3/4

3/5

4/6

MAP0.771

ASP

2/3

2/8

5/12

11/18

11/20

16/30

2/3

2/8

5/12

11/18

11/20

16/30

MASP0.557

ASDWP

2/3 * 1.0

2/8 * 0.0

5/12 * 0.9

11/18 * 0.0

11/20 * 0.0

16/30 * 0.0

2/3 * 1.0

2/8 * 0.0

5/12 * 0.9

11/18 * 0.0

11/20 * 0.0

16/30 * 0.0

MASDWP0.260

Page 42: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Comparative example of AP, ASP and ASDWP

RetrievedSegments

1

2

3

4

5

6

Rel Len/Total Len

2/3

0/5

3/4

6/6

0/2

5/10

AP

1

1/2

2/3

3/4

3/5

4/6

1

1/2

2/3

3/4

3/5

4/6

MAP0.771

ASP

2/3

2/8

5/12

11/18

11/20

16/30

2/3

2/8

5/12

11/18

11/20

16/30

MASP0.557

ASDWP

2/3 * 1.0

2/8 * 0.0

5/12 * 0.9

11/18 * 0.0

11/20 * 0.0

16/30 * 0.0

2/3 * 1.0

2/8 * 0.0

5/12 * 0.9

11/18 * 0.0

11/20 * 0.0

16/30 * 0.0

MASDWP0.260

Page 43: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Comparative example of AP, ASP and ASDWP

RetrievedSegments

1

2

3

4

5

6

Rel Len/Total Len

2/3

0/5

3/4

6/6

0/2

5/10

AP

1

1/2

2/3

3/4

3/5

4/6

1

1/2

2/3

3/4

3/5

4/6

MAP0.771

ASP

2/3

2/8

5/12

11/18

11/20

16/30

2/3

2/8

5/12

11/18

11/20

16/30

MASP0.557

ASDWP

2/3 * 1.0

2/8 * 0.0

5/12 * 0.9

11/18 * 0.0

11/20 * 0.0

16/30 * 0.0

2/3 * 1.0

2/8 * 0.0

5/12 * 0.9

11/18 * 0.0

11/20 * 0.0

16/30 * 0.0

MASDWP0.260

Page 44: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Comparative example of AP, ASP and ASDWP

RetrievedSegments

1

2

3

4

5

6

Rel Len/Total Len

2/3

0/5

3/4

6/6

0/2

5/10

AP

1

1/2

2/3

3/4

3/5

4/6

1

1/2

2/3

3/4

3/5

4/6

MAP0.771

ASP

2/3

2/8

5/12

11/18

11/20

16/30

2/3

2/8

5/12

11/18

11/20

16/30

MASP0.557

ASDWP

2/3 * 1.0

2/8 * 0.0

5/12 * 0.9

11/18 * 0.0

11/20 * 0.0

16/30 * 0.0

2/3 * 1.0

2/8 * 0.0

5/12 * 0.9

11/18 * 0.0

11/20 * 0.0

16/30 * 0.0

MASDWP0.260

Page 45: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Comparative example of AP, ASP and ASDWP

RetrievedSegments

1

2

3

4

5

6

Rel Len/Total Len

2/3

0/5

3/4

6/6

0/2

5/10

AP

1

1/2

2/3

3/4

3/5

4/6

1

1/2

2/3

3/4

3/5

4/6

MAP0.771

ASP

2/3

2/8

5/12

11/18

11/20

16/30

2/3

2/8

5/12

11/18

11/20

16/30

MASP0.557

ASDWP

2/3 * 1.0

2/8 * 0.0

5/12 * 0.9

11/18 * 0.0

11/20 * 0.0

16/30 * 0.0

2/3 * 1.0

2/8 * 0.0

5/12 * 0.9

11/18 * 0.0

11/20 * 0.0

16/30 * 0.0

MASDWP0.260

Page 46: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Comparative example of AP, ASP and ASDWP

RetrievedSegments

1

2

3

4

5

6

Rel Len/Total Len

2/3

0/5

3/4

6/6

0/2

5/10

AP

1

1/2

2/3

3/4

3/5

4/6

1

1/2

2/3

3/4

3/5

4/6

MAP0.771

ASP

2/3

2/8

5/12

11/18

11/20

16/30

2/3

2/8

5/12

11/18

11/20

16/30

MASP0.557

ASDWP

2/3 * 1.0

2/8 * 0.0

5/12 * 0.9

11/18 * 0.0

11/20 * 0.0

16/30 * 0.0

2/3 * 1.0

2/8 * 0.0

5/12 * 0.9

11/18 * 0.0

11/20 * 0.0

16/30 * 0.0

MASDWP0.260

Page 47: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Comparative example of AP, ASP and ASDWP

RetrievedSegments

1

2

3

4

5

6

Rel Len/Total Len

2/3

0/5

3/4

6/6

0/2

5/10

AP

1

1/2

2/3

3/4

3/5

4/6

1

1/2

2/3

3/4

3/5

4/6

MAP0.771

ASP

2/3

2/8

5/12

11/18

11/20

16/30

2/3

2/8

5/12

11/18

11/20

16/30

MASP0.557

ASDWP

2/3 * 1.0

2/8 * 0.0

5/12 * 0.9

11/18 * 0.0

11/20 * 0.0

16/30 * 0.0

2/3 * 1.0

2/8 * 0.0

5/12 * 0.9

11/18 * 0.0

11/20 * 0.0

16/30 * 0.0

MASDWP0.260

Page 48: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Comparative example of AP, ASP and ASDWP

RetrievedSegments

1

2

3

4

5

6

Rel Len/Total Len

2/3

0/5

3/4

6/6

0/2

5/10

AP

1

1/2

2/3

3/4

3/5

4/6

1

1/2

2/3

3/4

3/5

4/6

MAP0.771

ASP

2/3

2/8

5/12

11/18

11/20

16/30

2/3

2/8

5/12

11/18

11/20

16/30

MASP0.557

ASDWP

2/3 * 1.0

2/8 * 0.0

5/12 * 0.9

11/18 * 0.0

11/20 * 0.0

16/30 * 0.0

2/3 * 1.0

2/8 * 0.0

5/12 * 0.9

11/18 * 0.0

11/20 * 0.0

16/30 * 0.0

MASDWP0.260

Page 49: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Comparative example of AP, ASP and ASDWP

RetrievedSegments

1

2

3

4

5

6

Rel Len/Total Len

2/3

0/5

3/4

6/6

0/2

5/10

AP

1

1/2

2/3

3/4

3/5

4/6

1

1/2

2/3

3/4

3/5

4/6

MAP0.771

ASP

2/3

2/8

5/12

11/18

11/20

16/30

2/3

2/8

5/12

11/18

11/20

16/30

MASP0.557

ASDWP

2/3 * 1.0

2/8 * 0.0

5/12 * 0.9

11/18 * 0.0

11/20 * 0.0

16/30 * 0.0

2/3 * 1.0

2/8 * 0.0

5/12 * 0.9

11/18 * 0.0

11/20 * 0.0

16/30 * 0.0

MASDWP0.260

Page 50: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Comparative example of AP, ASP and ASDWP

RetrievedSegments

1

2

3

4

5

6

Rel Len/Total Len

2/3

0/5

3/4

6/6

0/2

5/10

AP

1

1/2

2/3

3/4

3/5

4/6

1

1/2

2/3

3/4

3/5

4/6

MAP0.771

ASP

2/3

2/8

5/12

11/18

11/20

16/30

2/3

2/8

5/12

11/18

11/20

16/30

MASP0.557

ASDWP

2/3 * 1.0

2/8 * 0.0

5/12 * 0.9

11/18 * 0.0

11/20 * 0.0

16/30 * 0.0

2/3 * 1.0

2/8 * 0.0

5/12 * 0.9

11/18 * 0.0

11/20 * 0.0

16/30 * 0.0

MASDWP0.260

Page 51: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Comparative example of AP, ASP and ASDWP

RetrievedSegments

1

2

3

4

5

6

Rel Len/Total Len

2/3

0/5

3/4

6/6

0/2

5/10

AP

1

1/2

2/3

3/4

3/5

4/6

1

1/2

2/3

3/4

3/5

4/6

MAP0.771

ASP

2/3

2/8

5/12

11/18

11/20

16/30

2/3

2/8

5/12

11/18

11/20

16/30

MASP0.557

ASDWP

2/3 * 1.0

2/8 * 0.0

5/12 * 0.9

11/18 * 0.0

11/20 * 0.0

16/30 * 0.0

2/3 * 1.0

2/8 * 0.0

5/12 * 0.9

11/18 * 0.0

11/20 * 0.0

16/30 * 0.0

MASDWP0.260

Page 52: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Retrieval Collection New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Outline

Speech Retrieval

Speech Search EvaluationMean Average Precision (MAP)Mean Average interpolated Precision (MAiP)mean Generalized Average Precision (mGAP)

New MetricsMean Average Segment Precision (MASP)Mean Average Segment Distance-Weighted Precision(MASDWP)

Retrieval Collection

Experimental Results

Conclusions

Page 53: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Retrieval Collection New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Test Collection

Speech collection: AMI CorpusI Ca. 100 hours of data (80 hours of speech)I 160 meetings:

I average length – 30 minutesI Transcript

I ManualI Automatic Speech Recognition (ASR), WER ≈ 30 %

Retrieval test set:I 25 queries with text taken form PowerPoint slides provided

with the AMI Corpus (avr len > 10 content words)I Manual relevance assessment

Page 54: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Retrieval Collection New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Segmentation Methods and Retrieval Runs

I Segmentation*:I Lexical cohesion based algorithms: TextTiling, C99I Time- and length-based algorithms:

time length = 60, 120, 150, 180 seconds;number of words per segment = 300, 400

I Extreme case: No segmentation

I Retrieval system:I SMART extended to use language modeling

* Manual boundaries for both types of transcript

Page 55: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Outline

Speech Retrieval

Speech Search EvaluationMean Average Precision (MAP)Mean Average interpolated Precision (MAiP)mean Generalized Average Precision (mGAP)

New MetricsMean Average Segment Precision (MASP)Mean Average Segment Distance-Weighted Precision(MASDWP)

Retrieval Collection

Experimental Results

Conclusions

Page 56: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Scores Results for 1000 retrieved documentsRun asr man

MAP MAiP MASP MASDWPc99 0.438 0.275 0.218 0.177tt 0.421 0.275 0.221 0.173

len 300 0.416 0.287 0.248 0.181len 400 0.463 0.286 0.237 0.147time 120 0.428 0.296 0.256 0.196time 150 0.448 0.283 0.243 0.171time 180 0.473 0.300 0.246 0.163time 60 0.333 0.259 0.238 0.220one doc 0.686 0.109 0.085 0.009

I one doc run: only MAP highest score, all other metricshas the lowest score − > contradict user experience

I time 60: the highest MASDWP rank − > shorter averagelength of the segments makes it easier to capturethe segment closer to the jump-in point

Page 57: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Scores Results for 1000 retrieved documentsRun asr man

MAP MAiP MASP MASDWPc99 0.438 0.275 0.218 0.177tt 0.421 0.275 0.221 0.173

len 300 0.416 0.287 0.248 0.181len 400 0.463 0.286 0.237 0.147time 120 0.428 0.296 0.256 0.196time 150 0.448 0.283 0.243 0.171time 180 0.473 0.300 0.246 0.163time 60 0.333 0.259 0.238 0.220one doc 0.686 0.109 0.085 0.009

I one doc run: only MAP highest score, all other metricshas the lowest score

− > contradict user experienceI time 60: the highest MASDWP rank − > shorter average

length of the segments makes it easier to capturethe segment closer to the jump-in point

Page 58: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Scores Results for 1000 retrieved documentsRun asr man

MAP MAiP MASP MASDWPc99 0.438 0.275 0.218 0.177tt 0.421 0.275 0.221 0.173

len 300 0.416 0.287 0.248 0.181len 400 0.463 0.286 0.237 0.147time 120 0.428 0.296 0.256 0.196time 150 0.448 0.283 0.243 0.171time 180 0.473 0.300 0.246 0.163time 60 0.333 0.259 0.238 0.220one doc 0.686 0.109 0.085 0.009

I one doc run: only MAP highest score, all other metricshas the lowest score − > contradict user experience

I time 60: the highest MASDWP rank − > shorter averagelength of the segments makes it easier to capturethe segment closer to the jump-in point

Page 59: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Scores Results for 1000 retrieved documentsRun asr man

MAP MAiP MASP MASDWPc99 0.438 0.275 0.218 0.177tt 0.421 0.275 0.221 0.173

len 300 0.416 0.287 0.248 0.181len 400 0.463 0.286 0.237 0.147time 120 0.428 0.296 0.256 0.196time 150 0.448 0.283 0.243 0.171time 180 0.473 0.300 0.246 0.163time 60 0.333 0.259 0.238 0.220one doc 0.686 0.109 0.085 0.009

I one doc run: only MAP highest score, all other metricshas the lowest score − > contradict user experience

I time 60: the highest MASDWP rank

− > shorter averagelength of the segments makes it easier to capturethe segment closer to the jump-in point

Page 60: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Scores Results for 1000 retrieved documentsRun asr man

MAP MAiP MASP MASDWPc99 0.438 0.275 0.218 0.177tt 0.421 0.275 0.221 0.173

len 300 0.416 0.287 0.248 0.181len 400 0.463 0.286 0.237 0.147time 120 0.428 0.296 0.256 0.196time 150 0.448 0.283 0.243 0.171time 180 0.473 0.300 0.246 0.163time 60 0.333 0.259 0.238 0.220one doc 0.686 0.109 0.085 0.009

I one doc run: only MAP highest score, all other metricshas the lowest score − > contradict user experience

I time 60: the highest MASDWP rank − > shorter averagelength of the segments makes it easier to capturethe segment closer to the jump-in point

Page 61: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Capturing Difference Between Segmentations

Rank c99 time 180 time 603 179/179

(–)

60/60

(–)

4 243/243

(–)

179/179

(–)

59/59

(1)

5 180/180

(-69)

60/60

(–)

6 105/125

(20)

59/59

(-10)

7 157/204

(47)

179/179

(0)

59/59

(–)

8 107/107

(-45)

59/179 60/60

(–)

9 350/429

(47)

162/180

(-4)

60/60

(21)

10 122/122

(-11)

143/181

(–)AP: one doc > time 180 > c99 > time 60AiP: c99 > time 180 > time 60 > one docASP time 180 > c99 > time 60 > one doc

ASDWP c99 > time 180 > time 60 > one doc

Page 62: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Capturing Difference Between Segmentations

Rank c99 time 180 time 603 179/179

(–)

60/60

(–)

4 243/243

(–)

179/179

(–)

59/59

(1)

5 180/180

(-69)

60/60

(–)

6 105/125

(20)

59/59

(-10)

7 157/204

(47)

179/179

(0)

59/59

(–)

8 107/107

(-45)

59/179 60/60

(–)

9 350/429

(47)

162/180

(-4)

60/60

(21)

10 122/122

(-11)

143/181

(–)

AP: one doc > time 180 > c99 > time 60AiP: c99 > time 180 > time 60 > one docASP time 180 > c99 > time 60 > one doc

ASDWP c99 > time 180 > time 60 > one doc

Page 63: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Capturing Difference Between Segmentations

Rank c99 time 180 time 603 179/179 (–) 60/60 (–)4 243/243 (–) 179/179 (–) 59/59 (1)5 180/180 (-69) 60/60 (–)6 105/125 (20) 59/59 (-10)7 157/204 (47) 179/179 (0) 59/59 (–)8 107/107 (-45) 59/179 60/60 (–)9 350/429 (47) 162/180 (-4) 60/60 (21)10 122/122 (-11) 143/181 (–)

AP: one doc > time 180 > c99 > time 60AiP: c99 > time 180 > time 60 > one docASP time 180 > c99 > time 60 > one doc

ASDWP c99 > time 180 > time 60 > one doc

Page 64: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Impact of Averaging Techniques

AiP: man<asr man; ASP: man>asr man

AiP: man<asr man; ASP: man>asr man(relevant content moves down from higher ranks)

Page 65: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Impact of Averaging Techniques

AiP: man<asr man; ASP: man>asr man

AiP: man<asr man; ASP: man>asr man(relevant content moves down from higher ranks)

Page 66: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Impact of Averaging Techniques

AiP: man<asr man; ASP: man>asr man

AiP: man<asr man; ASP: man>asr man(relevant content moves down from higher ranks)

Page 67: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Impact of Averaging Techniques

AiP: man<asr man; ASP: man>asr man

AiP: man<asr man; ASP: man>asr man(relevant content moves down from higher ranks)

Page 68: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Outline

Speech Retrieval

Speech Search EvaluationMean Average Precision (MAP)Mean Average interpolated Precision (MAiP)mean Generalized Average Precision (mGAP)

New MetricsMean Average Segment Precision (MASP)Mean Average Segment Distance-Weighted Precision(MASDWP)

Retrieval Collection

Experimental Results

Conclusions

Page 69: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Conclusions

MAP and MAiP do not reflect the user experience of informallystructured speech documents:

I MAP is appropriate for clearly defined documentsI MAiP works with transcript characters

Introduced MASP and MASDWP:

I MASP: captures the amount of relevant content thatappears at different ranks

I MASDWP: rewards runs where segmentation algorithmsput boundaries closer to the relevant content and thesesegments are higher in the ranked list

Page 70: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Conclusions

MAP and MAiP do not reflect the user experience of informallystructured speech documents:

I MAP is appropriate for clearly defined documentsI MAiP works with transcript characters

Introduced MASP and MASDWP:

I MASP: captures the amount of relevant content thatappears at different ranks

I MASDWP: rewards runs where segmentation algorithmsput boundaries closer to the relevant content and thesesegments are higher in the ranked list

Page 71: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Conclusions

MAP and MAiP do not reflect the user experience of informallystructured speech documents:

I MAP is appropriate for clearly defined documentsI MAiP works with transcript characters

Introduced MASP and MASDWP:

I MASP: captures the amount of relevant content thatappears at different ranks

I MASDWP: rewards runs where segmentation algorithmsput boundaries closer to the relevant content and thesesegments are higher in the ranked list

Page 72: New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval

Thank you for your attention!