The Meeting Recorder Project at ICSI

CSI I

The Meeting Recorder Project atICSI

Jane Edwards, Adam Janin, Barbara Peskin and Chuck Wooters

Other project participants:

Jeremy Ang, Don Baron, Dan Ellis, David Gelbart, Nelson Morgan,

Thilo Pfau, Elizabeth Shriberg, Andreas Stolcke

International Computer Science Institute

Berkeley, CA

USA

NIST 11/02/01 – p.1/23

CSI I

Outline

• Background• Current Status• Data Collection Process• Transcription Effort• Ability to contribute data• Open Issues• References

NIST 11/02/01 – p.2/23

CSI I

Background

Goal: to develop technology to “process” spokenlanguage from meetings

• Speaker change detection, speaker tracking.• SpeechCorder handheld portable device• Information retrieval• Dialog analysis/modeling• Speech recognition

• Far-field, digits, conversations, etc.

NIST 11/02/01 – p.3/23

CSI I

Background (cont.)

• Collaboration with UW, OGI, IBM, SRI,Columbia U.

• Collecting data since Feb 2000• Goal is 100 meeting-hours

NIST 11/02/01 – p.4/23

CSI I

Background (cont.)

Types of meetings we are collecting:• Regular, weekly group meetings• “Natural” data (meetings that would happen

even if we weren’t recording)• Close-talking and far-field microphones• Digits (a few simultaneous sessions)• Up to 10 speakers per meeting (averaging 6)• Few meeting types, but many tokens

NIST 11/02/01 – p.5/23

CSI I

Current Status

• 1017 total channel-hours recorded• 529 close-talking hours (3-10 channels per

meeting)• 487 far-field hours (6 channels per meeting)• 81 meeting-hours (85 meetings)• about 60 unique speakers

NIST 11/02/01 – p.6/23

CSI I

Data Collection Process

• Audio format: NIST Sphere,shortened (compressed), 16 KHz, 16 bit

• Up to 16 Channels (each in its own file):• 2 “PDA” mics• 4 PZM omni-directional (table-top) mics• 10 (max) close-talking (Sony R© and

Crown R©, mostly radio)• Skew (fixed by including offset info. for each

channel)

NIST 11/02/01 – p.7/23

CSI I

Meeting Room Hardware

Audio PC

ST

UD

I/O

PC

I car

dA/D 1

A/D 2

Wireless RX 5

2

2

2

2

2

2

6/8

TX1

TX2

TX3

TX4

TX5

PZM1

PZM2

PZM3

PZM4

Dummy PDA

Mackie mixer

JimBox PSU & breakout

Jimlet

Jimlet

Jimlet

Jimlet

Notes:

ADAT lightpipe

Hand mic

Wireless headsets

Computer headsets

Ambient mics

ADAT lightpipe

MainL/R

Aux1/2

ICSI Meeting Recorder Room Audio Setup 2000-05-05

All headsets are now wireless. 10/2001

NIST 11/02/01 – p.8/23

CSI I

Transcription Effort

• Transcription File Format• Transcription Tools• Transcription Process• What do we transcribe?• Transcript “Transformations”• Current Transcription Status

NIST 11/02/01 – p.9/23

CSI I

Transcription File Format

XML based on the following:• ETCA “Transcriber” tool.• TEI (Text Encoding Initiative), especially for

time concepts.• Annotated Transcription Graphs of Liberman,

Bird et. al. (ATLAS).

See “References” slides below for more details on

these items.

NIST 11/02/01 – p.10/23

CSI I

Transcription Tools

• trans → channeltrans• “linearizing” transcripts (for fast first-pass

transcription)

speechsilence beep <number> beep

"linearized""channelized"

12

3

4 1 2 3 4

NIST 11/02/01 – p.11/23

CSI I

Transcription Tools (cont.)

for the currentchannel

Transcription

All channels

Current chan

NIST 11/02/01 – p.12/23

CSI I

Transcription Process

Service

OutsideTranscription

Participants

Semi-automatic +2nd Transcriber +"Master" Transcriber

Students

ReviewandApproval

Checking

(Optional)TranscriptionFirst-pass

Audio

Transcription

NIST 11/02/01 – p.13/23

CSI I

What do we transcribe? (Part I)

• Speakers, channels• Words (plus abbreviations, acronyms, etc.)• Overlaps (recoverable from time marks)• Disfluencies (e.g. um, eh & interruptions)• Backchannels (e.g. uh-huh)• Non-canonical pronunciations• False-starts, interruptions, etc.

NIST 11/02/01 – p.14/23

CSI I

What do we transcribe? (Part II)

• Non-lexical events:• vocal: cough, laugh, breath, etc.• non-vocal: door slam, paper noise, etc.

• Acoustic uncertainty• Qualifying information & contextual remarks• “Bleeps”• Utterance boundaries (via standard

orthographic conventions)

NIST 11/02/01 – p.15/23

CSI I

Transcript “Transformations”

Master transcript to be transformed to application

specific versions.

(NIST STM)Speech Recog. Dialog Modeling ???

Master Transcript

Automatic Conversions

NIST 11/02/01 – p.16/23

CSI I

Current Transcription Status

0

20

40

60

80

100

10

Approved

29

Checked

49Transcribed

81Recorded

Hours (as of 10/31/01)

NIST 11/02/01 – p.17/23

CSI I

Ability to Contribute

• Our intention is to make the meeting corpusavailable to the community.

• Consent forms and Human Subjects approval• Planning to have all 100 meetings collected

by the end of 2001. Data to be transcribedand approved for release “soon” after that.

NIST 11/02/01 – p.18/23

CSI I

Open Issues

• How do we distribute the data?• Estimating 50 Gigs of data for 100 hours

• “Bleeping” vs. discarding entire meeting• What gets transcribed? (Can’t anticipate all

desired levels of annotation nor all potentialapplications.)

• Legal “responsibility” of organizationcollecting the data.

NIST 11/02/01 – p.19/23

CSI I

References

WWW:

• ETCA “Transcriber” tool:http://www.etca.fr/CTA/gip/Projects/Transcriber/

• TEI (Text Encoding Initiative):Main site: http://www.tei-c.org/XML for TEI Lite:http://www.oasis-open.org/cover/tei.html

• ATLAS (Architecture and Tools for Linguistic AnalysisSystems)http://www.nist.gov/speech/atlas/

NIST 11/02/01 – p.20/23

CSI I

References

WWW: (cont.)

• Annotation graphs (which are a special case ofATLAS):http://morph.ldc.upenn.edu/AG/

• EAGLES (Expert Advisory Group on LanguageEngineering Standards):http://www.ling.lancs.ac.uk/eagles/delivera/wp4aug1.html

• MATE (Multilevel Annotation, Tools Engineering):http://mate.nis.sdu.dk/

NIST 11/02/01 – p.21/23

CSI I

References (cont.)

Tools:

• Barras, C. Geoffrois, E., Wu, Z., and Liberman, M. (2001). Transcriber:

development and use of a tool for assisting speech corpora production. Speech

Communication, 33, 5-22. ("Transcriber" interface)

Markup:• Bird, S., Day, D., Garofolo, J., Henderson, J., Laprun, C., and Liberman, M. (2000)

"ATLAS: A flexible and extensible architecture for linguistic annotation."http://morph.ldc.upenn.edu/AG/

• Johansson, S. (1995). The approach of the Text Encoding Initiative to theencoding of spoken discourse. In G. N. Leech, G. Myers, J. Thomas (eds), SpokenEnglish on Computer: Transcript, Mark-up and Application (pp. 82–98). NY:Longman Publishing.

NIST 11/02/01 – p.22/23

CSI I

References (cont.)

Transcription:• Edwards, J. A. (1993). Principles and Contrasting Systems of Discourse

Transcription. In J. A. Edwards, and M. D. Lampert (eds).Talking Data: Transcription and Coding in Discourse Research (pp. 3–31).Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.

• Edwards, J. A. (2001). The transcription of discourse. InThe Handbook of Discourse Analysis. Edited by Deborah Tannen, DeborahSchiffrin, and Heidi Hamilton. New York: Blackwell.

• Leech, G., Weisser, M., Grice, M. and Wilson, A. (1998). "Survey and guidelinesfor the representation and annotation of dialogues." In: Gibbon, D., Moore, R. andWinski, R. (2nd edition). Handbook of standards and resources for spokenlanguage systems. Berlin: Mouton de Gruyter.(http://www.ling.lancs.ac.uk/eagles/delivera/wp4aug1.html)

NIST 11/02/01 – p.23/23

The Meeting Recorder Project at ICSI

Documents

Transcript of The Meeting Recorder Project at ICSI