The Meeting Recorder Project at ICSI
Transcript of The Meeting Recorder Project at ICSI
CSI I
The Meeting Recorder Project atICSI
Jane Edwards, Adam Janin, Barbara Peskin and Chuck Wooters
Other project participants:
Jeremy Ang, Don Baron, Dan Ellis, David Gelbart, Nelson Morgan,
Thilo Pfau, Elizabeth Shriberg, Andreas Stolcke
International Computer Science Institute
Berkeley, CA
USA
NIST 11/02/01 – p.1/23
CSI I
Outline
• Background• Current Status• Data Collection Process• Transcription Effort• Ability to contribute data• Open Issues• References
NIST 11/02/01 – p.2/23
CSI I
Background
Goal: to develop technology to “process” spokenlanguage from meetings
• Speaker change detection, speaker tracking.• SpeechCorder handheld portable device• Information retrieval• Dialog analysis/modeling• Speech recognition
• Far-field, digits, conversations, etc.
NIST 11/02/01 – p.3/23
CSI I
Background (cont.)
• Collaboration with UW, OGI, IBM, SRI,Columbia U.
• Collecting data since Feb 2000• Goal is 100 meeting-hours
NIST 11/02/01 – p.4/23
CSI I
Background (cont.)
Types of meetings we are collecting:• Regular, weekly group meetings• “Natural” data (meetings that would happen
even if we weren’t recording)• Close-talking and far-field microphones• Digits (a few simultaneous sessions)• Up to 10 speakers per meeting (averaging 6)• Few meeting types, but many tokens
NIST 11/02/01 – p.5/23
CSI I
Current Status
• 1017 total channel-hours recorded• 529 close-talking hours (3-10 channels per
meeting)• 487 far-field hours (6 channels per meeting)• 81 meeting-hours (85 meetings)• about 60 unique speakers
NIST 11/02/01 – p.6/23
CSI I
Data Collection Process
• Audio format: NIST Sphere,shortened (compressed), 16 KHz, 16 bit
• Up to 16 Channels (each in its own file):• 2 “PDA” mics• 4 PZM omni-directional (table-top) mics• 10 (max) close-talking (Sony R© and
Crown R©, mostly radio)• Skew (fixed by including offset info. for each
channel)
NIST 11/02/01 – p.7/23
CSI I
Meeting Room Hardware
Audio PC
ST
UD
I/O
PC
I car
dA/D 1
A/D 2
Wireless RX 5
2
2
2
2
2
2
6/8
TX1
TX2
TX3
TX4
TX5
PZM1
PZM2
PZM3
PZM4
Dummy PDA
Mackie mixer
JimBox PSU & breakout
Jimlet
Jimlet
Jimlet
Jimlet
Notes:
ADAT lightpipe
Hand mic
Wireless headsets
Computer headsets
Ambient mics
ADAT lightpipe
MainL/R
Aux1/2
ICSI Meeting Recorder Room Audio Setup 2000-05-05
All headsets are now wireless. 10/2001
NIST 11/02/01 – p.8/23
CSI I
Transcription Effort
• Transcription File Format• Transcription Tools• Transcription Process• What do we transcribe?• Transcript “Transformations”• Current Transcription Status
NIST 11/02/01 – p.9/23
CSI I
Transcription File Format
XML based on the following:• ETCA “Transcriber” tool.• TEI (Text Encoding Initiative), especially for
time concepts.• Annotated Transcription Graphs of Liberman,
Bird et. al. (ATLAS).
See “References” slides below for more details on
these items.
NIST 11/02/01 – p.10/23
CSI I
Transcription Tools
• trans → channeltrans• “linearizing” transcripts (for fast first-pass
transcription)
speechsilence beep <number> beep
"linearized""channelized"
12
3
4 1 2 3 4
NIST 11/02/01 – p.11/23
CSI I
Transcription Tools (cont.)
for the currentchannel
Transcription
All channels
Current chan
NIST 11/02/01 – p.12/23
CSI I
Transcription Process
Service
OutsideTranscription
Participants
Semi-automatic +2nd Transcriber +"Master" Transcriber
Students
ReviewandApproval
Checking
(Optional)TranscriptionFirst-pass
Audio
Transcription
NIST 11/02/01 – p.13/23
CSI I
What do we transcribe? (Part I)
• Speakers, channels• Words (plus abbreviations, acronyms, etc.)• Overlaps (recoverable from time marks)• Disfluencies (e.g. um, eh & interruptions)• Backchannels (e.g. uh-huh)• Non-canonical pronunciations• False-starts, interruptions, etc.
NIST 11/02/01 – p.14/23
CSI I
What do we transcribe? (Part II)
• Non-lexical events:• vocal: cough, laugh, breath, etc.• non-vocal: door slam, paper noise, etc.
• Acoustic uncertainty• Qualifying information & contextual remarks• “Bleeps”• Utterance boundaries (via standard
orthographic conventions)
NIST 11/02/01 – p.15/23
CSI I
Transcript “Transformations”
Master transcript to be transformed to application
specific versions.
(NIST STM)Speech Recog. Dialog Modeling ???
Master Transcript
Automatic Conversions
NIST 11/02/01 – p.16/23
CSI I
Current Transcription Status
0
20
40
60
80
100
10
Approved
29
Checked
49Transcribed
81Recorded
Hours (as of 10/31/01)
NIST 11/02/01 – p.17/23
CSI I
Ability to Contribute
• Our intention is to make the meeting corpusavailable to the community.
• Consent forms and Human Subjects approval• Planning to have all 100 meetings collected
by the end of 2001. Data to be transcribedand approved for release “soon” after that.
NIST 11/02/01 – p.18/23
CSI I
Open Issues
• How do we distribute the data?• Estimating 50 Gigs of data for 100 hours
• “Bleeping” vs. discarding entire meeting• What gets transcribed? (Can’t anticipate all
desired levels of annotation nor all potentialapplications.)
• Legal “responsibility” of organizationcollecting the data.
NIST 11/02/01 – p.19/23
CSI I
References
WWW:
• ETCA “Transcriber” tool:http://www.etca.fr/CTA/gip/Projects/Transcriber/
• TEI (Text Encoding Initiative):Main site: http://www.tei-c.org/XML for TEI Lite:http://www.oasis-open.org/cover/tei.html
• ATLAS (Architecture and Tools for Linguistic AnalysisSystems)http://www.nist.gov/speech/atlas/
NIST 11/02/01 – p.20/23
CSI I
References
WWW: (cont.)
• Annotation graphs (which are a special case ofATLAS):http://morph.ldc.upenn.edu/AG/
• EAGLES (Expert Advisory Group on LanguageEngineering Standards):http://www.ling.lancs.ac.uk/eagles/delivera/wp4aug1.html
• MATE (Multilevel Annotation, Tools Engineering):http://mate.nis.sdu.dk/
NIST 11/02/01 – p.21/23
CSI I
References (cont.)
Tools:
• Barras, C. Geoffrois, E., Wu, Z., and Liberman, M. (2001). Transcriber:
development and use of a tool for assisting speech corpora production. Speech
Communication, 33, 5-22. ("Transcriber" interface)
Markup:• Bird, S., Day, D., Garofolo, J., Henderson, J., Laprun, C., and Liberman, M. (2000)
"ATLAS: A flexible and extensible architecture for linguistic annotation."http://morph.ldc.upenn.edu/AG/
• Johansson, S. (1995). The approach of the Text Encoding Initiative to theencoding of spoken discourse. In G. N. Leech, G. Myers, J. Thomas (eds), SpokenEnglish on Computer: Transcript, Mark-up and Application (pp. 82–98). NY:Longman Publishing.
NIST 11/02/01 – p.22/23
CSI I
References (cont.)
Transcription:• Edwards, J. A. (1993). Principles and Contrasting Systems of Discourse
Transcription. In J. A. Edwards, and M. D. Lampert (eds).Talking Data: Transcription and Coding in Discourse Research (pp. 3–31).Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
• Edwards, J. A. (2001). The transcription of discourse. InThe Handbook of Discourse Analysis. Edited by Deborah Tannen, DeborahSchiffrin, and Heidi Hamilton. New York: Blackwell.
• Leech, G., Weisser, M., Grice, M. and Wilson, A. (1998). "Survey and guidelinesfor the representation and annotation of dialogues." In: Gibbon, D., Moore, R. andWinski, R. (2nd edition). Handbook of standards and resources for spokenlanguage systems. Berlin: Mouton de Gruyter.(http://www.ling.lancs.ac.uk/eagles/delivera/wp4aug1.html)
NIST 11/02/01 – p.23/23