Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike...
-
date post
19-Dec-2015 -
Category
Documents
-
view
218 -
download
0
Transcript of Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike...
Browser Evaluation Test
…A Trial Run
Pierre Wellner & Mike Flynn, IDIAPFribourg Nov 26, 2004
Mike Flynn, Pierre Wellner IDIAPSimon Tucker, Steve Whittaker
University of Sheffield
Reminder
• What is a Browser for?
“Browsing a meeting recording is an attempt to find a maximum number of observations of interest in a minimum
amount of time.”
• “Observations of Interest”– Pairs of complementary statements about the meeting– Of interest to… the participants, or to people who missed
the meeting.• Observers
– Unlimited access– No time limit
• actually 4½ x meeting time (on average)• Subjects
– Answer as many Questions as possible– Time limit: ½ meeting time– Questions are observation pairs, without indication
tests
sampling
tests
sampling
scoringscoring
observationsobservations answersanswers
observers
playbacksystem
observers
playbacksystem
subjects
browser under test
subjects
browser under test
meetingparticipants
corpus
recording system
meetingparticipants
corpus
recording system
scores
The BET The BET ProcessProcess
Trial Run: Observers
• Needed native English speakers– University of Sheffield– Students, researchers, lecturers
• Meetings 1 x 44 minutes• Observers 6• Observations294 (only 255 used)
Observations… about the observations
• Examples:Agnes thinks having the sofa along the whiteboard is a good
idea.Agnes thinks the sofa will be in the way if under the
whiteboard.Martin wants to put the coffee machine along the left wall.Martin wants to put the coffee machine along the right wall.
• Mainly about what was said, not done• Participants names all in top ten words
– Others: the, of, to, at, is, that• 283/294 (83%) use participant by name• Observation density…
Observation Density Graph
0
2
4
6
8
10
12
14
16
18
20
00:00 10:00 20:00 30:00 40:00 50:00
Media Time
Obs
erva
tions
per
min
ute
Trial Run: Subjects
• 11f + 13m = 24 total• University of Sheffield• Three conditions:
“Guess” - no media whatsoever“Base” - same media as Observers“F1” - Ferret with Brno ASR transcript +
slides + speaker segmentations
Results: Guess Condition
SubjectAnswers Correct Incorrect ScoreA1 255 142 113 55.7%A2 220 123 97 55.9%A3 135 81 54 60.0%Total 610 346 264 56.7%
Results: Base Condition
Subject Answers Correct Incorrect ScoreB1 22 14 8 63%B2 25 17 8 68%B3 12 7 5 58%B4 8 8 0 100%B5 5 2 3 40%B6 3 1 2 33%B7 12 8 4 66%B8 5 4 1 80%B9 8 3 5 37%B10 22 12 10 54%B11 4 4 0 100%Base Total 126 80 46 63.5%
Results: F1 Condition
SubjectAnswersCorrectIncorrect ScoreC1 20 11 9 55%C2 6 3 3 50%C3 18 17 1 94%C4 21 12 9 57%C5 18 11 7 61%C6 11 7 4 63%C7 6 6 0 100%C8 14 10 4 71%C9 12 11 1 91%C10 7 2 5 28%F1 Total 133 90 43 67.7%
Results by time, overlaid
0%
10%
20%
30%
40%
50%
60%
70%
01234567891011121314151617181920212223
Time Left
Ave
rage
Sco
re p
er Q
uest
ion
Base Condition
F1 Condition
Guess Condition
Scores by Time
Media time difference histogram
-5
0
5
10
15
20
-45 -35 -25 -15 -5 5 15 25 35 45
Media Time Difference (minutes)
Num
ber o
f Ans
wer
s
Incorrect Correct
Proximity of Answers to Questions
Speed versus Accuracy graph
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 5 10 15 20 25 30
Questions Answered
Que
stio
ns C
orre
ct
Base conditionF1 conditionBase meanF1 meanGuess mean
Speed versus Accuracy