Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike...

21
Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker University of Sheffield
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    0

Transcript of Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike...

Browser Evaluation Test

…A Trial Run

Pierre Wellner & Mike Flynn, IDIAPFribourg Nov 26, 2004

Mike Flynn, Pierre Wellner IDIAPSimon Tucker, Steve Whittaker

University of Sheffield

Outline

• Reminder of BET• Trial Run• Results• Analysis• Future work

Reminder

• What is a Browser for?

“Browsing a meeting recording is an attempt to find a maximum number of observations of interest in a minimum

amount of time.”

• “Observations of Interest”– Pairs of complementary statements about the meeting– Of interest to… the participants, or to people who missed

the meeting.• Observers

– Unlimited access– No time limit

• actually 4½ x meeting time (on average)• Subjects

– Answer as many Questions as possible– Time limit: ½ meeting time– Questions are observation pairs, without indication

tests

sampling

tests

sampling

scoringscoring

observationsobservations answersanswers

observers

playbacksystem

observers

playbacksystem

subjects

browser under test

subjects

browser under test

meetingparticipants

corpus

recording system

meetingparticipants

corpus

recording system

scores

The BET The BET ProcessProcess

Trial Run: Observers

• Needed native English speakers– University of Sheffield– Students, researchers, lecturers

• Meetings 1 x 44 minutes• Observers 6• Observations294 (only 255 used)

Observer’s Screen Shot

Observations… about the observations

• Examples:Agnes thinks having the sofa along the whiteboard is a good

idea.Agnes thinks the sofa will be in the way if under the

whiteboard.Martin wants to put the coffee machine along the left wall.Martin wants to put the coffee machine along the right wall.

• Mainly about what was said, not done• Participants names all in top ten words

– Others: the, of, to, at, is, that• 283/294 (83%) use participant by name• Observation density…

Observation Density Graph

0

2

4

6

8

10

12

14

16

18

20

00:00 10:00 20:00 30:00 40:00 50:00

Media Time

Obs

erva

tions

per

min

ute

Trial Run: Subjects

• 11f + 13m = 24 total• University of Sheffield• Three conditions:

“Guess” - no media whatsoever“Base” - same media as Observers“F1” - Ferret with Brno ASR transcript +

slides + speaker segmentations

Guess Condition Screen Shot

Base Condition Screen Shot

F1 Condition Screen Shot

Results: Guess Condition

SubjectAnswers Correct Incorrect ScoreA1 255 142 113 55.7%A2 220 123 97 55.9%A3 135 81 54 60.0%Total 610 346 264 56.7%

Results: Base Condition

Subject Answers Correct Incorrect ScoreB1 22 14 8 63%B2 25 17 8 68%B3 12 7 5 58%B4 8 8 0 100%B5 5 2 3 40%B6 3 1 2 33%B7 12 8 4 66%B8 5 4 1 80%B9 8 3 5 37%B10 22 12 10 54%B11 4 4 0 100%Base Total 126 80 46 63.5%

Results: F1 Condition

SubjectAnswersCorrectIncorrect ScoreC1 20 11 9 55%C2 6 3 3 50%C3 18 17 1 94%C4 21 12 9 57%C5 18 11 7 61%C6 11 7 4 63%C7 6 6 0 100%C8 14 10 4 71%C9 12 11 1 91%C10 7 2 5 28%F1 Total 133 90 43 67.7%

Details

• Scores by time• Media time-difference• Speed versus accuracy

Results by time, overlaid

0%

10%

20%

30%

40%

50%

60%

70%

01234567891011121314151617181920212223

Time Left

Ave

rage

Sco

re p

er Q

uest

ion

Base Condition

F1 Condition

Guess Condition

Scores by Time

Media time difference histogram

-5

0

5

10

15

20

-45 -35 -25 -15 -5 5 15 25 35 45

Media Time Difference (minutes)

Num

ber o

f Ans

wer

s

Incorrect Correct

Proximity of Answers to Questions

Speed versus Accuracy graph

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0 5 10 15 20 25 30

Questions Answered

Que

stio

ns C

orre

ct

Base conditionF1 conditionBase meanF1 meanGuess mean

Speed versus Accuracy

BET scores

Condition Speed AccuracyGuess 27.7 56.7%Base 5.7 63.5%F1 6.0 67.7%

Future work

• AMI recording 100 hour corpus• More observations• More subjects

– reduce confidence interval (~18% wide)

• Design, test & comparebrowser improvements