Technical Aspects of the CALO Recorder
description
Transcript of Technical Aspects of the CALO Recorder
![Page 1: Technical Aspects of the CALO Recorder](https://reader037.fdocuments.in/reader037/viewer/2022102818/56813e10550346895da7ef00/html5/thumbnails/1.jpg)
Technical Aspects of the CALO Recorder
By Satanjeev Banerjee
Thomas QuiselJason CohenArthur Chan
Yitao SunDavid Huggins-Daines
Alex Rudnicky
![Page 2: Technical Aspects of the CALO Recorder](https://reader037.fdocuments.in/reader037/viewer/2022102818/56813e10550346895da7ef00/html5/thumbnails/2.jpg)
Role of the CALO recorder
A centralized mechanism to collect all perceptual events. Speech, Text
CMU provides technology on On Event Recording On Speech Recognition
![Page 3: Technical Aspects of the CALO Recorder](https://reader037.fdocuments.in/reader037/viewer/2022102818/56813e10550346895da7ef00/html5/thumbnails/3.jpg)
Role of the CALO Recorder One of the component of CAMPER The four:
CALO recorder Speechalizer
End-pointing Information Prosodic Information Speech Recognition
CAMSeg Speech Segmentation Understanding
![Page 4: Technical Aspects of the CALO Recorder](https://reader037.fdocuments.in/reader037/viewer/2022102818/56813e10550346895da7ef00/html5/thumbnails/4.jpg)
An Architecture Diagram (Client Side)
Audio Capturing Text Capturing through Keyboard
Ring Buffers
End-Pointer
VU Meter Speech
Decoder
Other Events
Storage
![Page 5: Technical Aspects of the CALO Recorder](https://reader037.fdocuments.in/reader037/viewer/2022102818/56813e10550346895da7ef00/html5/thumbnails/5.jpg)
Persistence of Data
Background Intelligent Transfer System (BITS) Use to transfer data off-line
![Page 6: Technical Aspects of the CALO Recorder](https://reader037.fdocuments.in/reader037/viewer/2022102818/56813e10550346895da7ef00/html5/thumbnails/6.jpg)
Technical Challenges in the Recorder Threading Audio Buffering Time-synchronization Real-time processing
End-pointing Speech processing
Portability Maintenance/Distribution
![Page 7: Technical Aspects of the CALO Recorder](https://reader037.fdocuments.in/reader037/viewer/2022102818/56813e10550346895da7ef00/html5/thumbnails/7.jpg)
Threading Several processing needs to be concurrently
VU meter Speech Processing and Higher-level Understanding Graphical User Interface
Long development time was invested to make the communication between to be correct.
(By Thomas Quisel) See Architecture Diagram next slides
Example Issues: In some platforms, WX implementation will make GUI thread disallow other threads to call its drawing functions.
![Page 8: Technical Aspects of the CALO Recorder](https://reader037.fdocuments.in/reader037/viewer/2022102818/56813e10550346895da7ef00/html5/thumbnails/8.jpg)
![Page 9: Technical Aspects of the CALO Recorder](https://reader037.fdocuments.in/reader037/viewer/2022102818/56813e10550346895da7ef00/html5/thumbnails/9.jpg)
Audio Buffering Sphinx 2, 3.X libaudio require,
Capture audio Do processing on the audio buffer.
If the processing thread is slightly slower than 1xRT Audio will be lost
(By Jason Cohen) A ring buffer structure is implemented.
![Page 10: Technical Aspects of the CALO Recorder](https://reader037.fdocuments.in/reader037/viewer/2022102818/56813e10550346895da7ef00/html5/thumbnails/10.jpg)
Time Synchronization By David Huggins Simple NTP (SNTP) is used in getting
universal time coordinate (UTC) from arbitrary NTP server Clone of standard NTP implementation
Internal Synchronization Synchronization time between machines 50-60ms
Major challenge is the delay imposed by OS/audio capturing software.
![Page 11: Technical Aspects of the CALO Recorder](https://reader037.fdocuments.in/reader037/viewer/2022102818/56813e10550346895da7ef00/html5/thumbnails/11.jpg)
Real-time Processing Role of End-pointing and Recognition
After long-time debate Two stage end-pointing and recognition architecture
is chosen By Ziad
High performance end-pointing routine is created Gaussian Mixture Model-based End-pointer implemented as a frames voter within
segments The parameters are further manually tuned. Speed optimized. Now in s3ep, a customized version of Sphinx
![Page 12: Technical Aspects of the CALO Recorder](https://reader037.fdocuments.in/reader037/viewer/2022102818/56813e10550346895da7ef00/html5/thumbnails/12.jpg)
![Page 13: Technical Aspects of the CALO Recorder](https://reader037.fdocuments.in/reader037/viewer/2022102818/56813e10550346895da7ef00/html5/thumbnails/13.jpg)
Speech Recognizer
Resulting output is fed to the recognizer
Speech Recognition in meeting Regards as one of the biggest
challenge in the field Results largely varied from meeting
style, number of attendants, topics, disfluencies of the speakers.
![Page 14: Technical Aspects of the CALO Recorder](https://reader037.fdocuments.in/reader037/viewer/2022102818/56813e10550346895da7ef00/html5/thumbnails/14.jpg)
Accuracy Performance, still under heavy work, Currently……
In the cleanest meeting (Bdb001) With one very dominating male
speaker With one very dominating female
speaker Speaker speaking rate entropy is
lowest Error rate 29.4%
![Page 15: Technical Aspects of the CALO Recorder](https://reader037.fdocuments.in/reader037/viewer/2022102818/56813e10550346895da7ef00/html5/thumbnails/15.jpg)
Phase IV of Accuracy Improvement (Core) Boosting-based training Confidence-based N-best re-ranking Speaker adaptation based on
transformation Speaker normalization Include BN , SWB material in LM
training Dictionary Refinement
![Page 16: Technical Aspects of the CALO Recorder](https://reader037.fdocuments.in/reader037/viewer/2022102818/56813e10550346895da7ef00/html5/thumbnails/16.jpg)
Phase IV of Accuracy Improvement (Optional)
STC MLLT DT PLP, TRAP LM with disfluencies and back-
channeling
![Page 17: Technical Aspects of the CALO Recorder](https://reader037.fdocuments.in/reader037/viewer/2022102818/56813e10550346895da7ef00/html5/thumbnails/17.jpg)
Speed 2.2G machine Communicator
S2, 17.3%, 0.34xRT S3.X BL 11.8%, 4xRT S3.X Tuned 12.8, 0.87xRT
WSJ 5k S3.X BL 7.4% 1.61xRT S3.X BL 8.3% 0.5xRT
ICSI With tuning SVQ and CIGMMS, 0.7xRT is achieved. We may possibly tune up the results. Benchmarking results need time to prepared
![Page 18: Technical Aspects of the CALO Recorder](https://reader037.fdocuments.in/reader037/viewer/2022102818/56813e10550346895da7ef00/html5/thumbnails/18.jpg)
Maintenance and Distribution
All in local CVS C, Java
Will soon move to SRI Regular release is created, usage
of SRI’s CVS will blur this line.
![Page 19: Technical Aspects of the CALO Recorder](https://reader037.fdocuments.in/reader037/viewer/2022102818/56813e10550346895da7ef00/html5/thumbnails/19.jpg)
Conclusion
Engineering work is mostly done for the recorder
Time to improve individual components.
Everyone is welcomed to join the effort.