Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?
-
date post
22-Dec-2015 -
Category
Documents
-
view
220 -
download
3
Transcript of Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?
![Page 1: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/1.jpg)
Speech tools
Jean-Philippe Goldman
03.03.2004
![Page 2: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/2.jpg)
2
Two questions
What kind of data ?
Which task ?
![Page 3: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/3.jpg)
3
What kind of data ? Speech content (noise, multivoice,…) Data File
Sound/Transcription/PitchCurve Sampling/Quantization
16k 12k 8k 4k 8bit Size 16k16bit,256kbps 1.9Mo/mn 115Mo/h Format
Sound: wav, wma, mp3, ogg, aiff, aifc, au, vox, raw, sd, CSL, Ogg/Vorbis, NIST/Sphere
Transcription: HTK, TIMIT, TextGrid, Phondat Number of files
![Page 4: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/4.jpg)
4
Which task ?
Visualization and Edition: Record, Play, edit, mix, add effects
Analysis: spectral, pitch
Speech manipulation: Filtering, mixing, adding effects, prosodic manipulation
Annotation: segmentation, labeling
Scripting: Batch, communication with outside
Plotting
![Page 5: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/5.jpg)
5
Examples of tasks
build stimuli for an experiment (i.e. cross-splicing)
manage a speech database for a TTS engine create a prosodic database analyze speech corpus from experiment
recordings verify/correct an automatic segmentation
![Page 6: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/6.jpg)
6
Two questions
What kind of data ? Which task ?
Two rules
there is no unique tool to do everything there are plenty of ways to do one thing
![Page 7: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/7.jpg)
7
Tool features
Visualization/Edition Analysis Speech manipulation Annotation Scripting Plotting
Supported format Platform/installation Evolution/community Accessibility Price
![Page 8: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/8.jpg)
8
Softwares
Goldwave (audio editor) Esps Xwaves (routines + visual.) Praat (speech analysis) Wavesurfer (speech editor) Transcriber (annotation tool) Matlab (general purpose soft) OGI speech tools (routines + app. dev.) …winpitch, pitchworks, phonedit, cooledit…..
![Page 9: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/9.jpg)
9
Goldwave
self-defined as “top rated, professional digital audio editor”
![Page 10: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/10.jpg)
10
Goldwave
pros : edition (good gestion of memory for big files), many FX, noise reduction, real-time spectrum and VU meters, various formats, batch conversion, chain effects, easy interface
cons: nothing for speech (pitch, formant), windows only, no scripting
Good for file edition not for speech
![Page 11: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/11.jpg)
11
![Page 12: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/12.jpg)
12
Esps - Waves
Developed by Entropic + AT&T. Now public Comp.speech FAQ says:
Esps: comprehensive set of speech analysis/processing tools
Waves is a graphical front-end for speech processing (waveforms, spectrograms, pitch) includes a signal labeling utility
![Page 13: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/13.jpg)
13
![Page 14: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/14.jpg)
14
Esps – waves
pros: powerful, designed for big files, cons: UNIX only (free BSD), not standard
formats, requires programming skills, development has stopped
![Page 15: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/15.jpg)
15
Praat
Developed by P.Boersma and D.Weenink at the Institute of Phonetic Sciences, University of Amsterdam
general purpose speech tool : edition, segmentation and labeling, prosodic manipulation
![Page 16: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/16.jpg)
16
![Page 17: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/17.jpg)
17
Praat
pros: designed for speech analysis (not only sound edition or spectrogram visualization), nice GUI, scripting, active development and community, prosodic manipulation
cons: limited scripting language, native format of transcription and pitch files
![Page 18: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/18.jpg)
18
WaveSurfer Open Source tool for sound visualization and
manipulation speech/sound analysis and sound
annotation/transcription platform for more advanced/specialized
applications: extending WaveSurfer with new custom plug-ins or embedding WaveSurfer visualization components in other applications
Requires SnackToolKit
![Page 19: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/19.jpg)
19
![Page 20: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/20.jpg)
20
Transcriber
Authors: C. Barras, E. Geoffrois Relies on Snack (Tcl/tk) Good for annotation Nice, simple GUI No speech analysis
![Page 21: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/21.jpg)
21
![Page 22: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/22.jpg)
22
Matlab (Mathworks)
Math. environment Signal processing toolbox : filter-design,
spectral analysis, waveform generation, linear prediction
voicebox (2002) [email protected] pitch determination algorithm (2002)
Xuejing Sun [email protected] colea speech editor (1998) Philip Loizou
[email protected] Univ of Texas-Dallas
![Page 23: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/23.jpg)
23
Matlab (Mathworks)
pros: open, powerful, scripting, excellent plotting
cons: poor speech community, standards, not designed for big files
![Page 24: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/24.jpg)
24
OGI speech tools/CSLU Toolkit development started in 1992 in C on Unix, at Center for Spoken
Language Understanding (CSLU) at OGI Includes :
An X windows display tool (LYRE) display, edit speech signal, spectrograms, phoneme labels, and other information
a set of C library routines (LIBNSPEECH), utilities for converting file formats, filtering, Neural Network training, vector-quantizer, database utility to automate speech database related enquiries
a set of PERL Scripts which have been used mainly to automate the use of the OGI Speech Tools.
MAN Pages RAD rapid application development
points of entry: Package(C), script(tcl), GUI(tk) levels free for research use
![Page 25: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/25.jpg)
25
![Page 26: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/26.jpg)
26
Ed
it
An
al
Man
ip
An
no
t
Scrip
t
Plo
t
Fo
rmat
OS
Evo
lut.
Co
mm
Price
Goldwavewin $40
EspsWaves C sh Unix free
Praat
yesnative
consolesendpraat src free
wavesurfer +snack
Ctcl/tk
python src free
transcriberxml free
OGIToolkit free
matlab + Sigproc+ packages native no BSD
stud.$100
$40/tbx
Summary
= yes but requires some dev.
![Page 27: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/27.jpg)
27
Expect to do conversions
Sound files goldwave (win) sox (unix)
Transcription files scripts to convert text-formatted label files
![Page 28: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?](https://reader036.fdocuments.in/reader036/viewer/2022081514/56649d805503460f94a6459b/html5/thumbnails/28.jpg)
28
Links www.goldwave.com www.speech.kth.se/software/#esps www.praat.org www.speech.kth.se/software/#wavesurfer www.cse.ogi.edu/toolkit www.mathworks.com (Matlab)
www.lpl.univ-aix.fr/~sqlab/ (phonedit) www.sciconrd.com/pworks.htm (PitchWorks) www.winpitch.com (WinPitch) www.adobe.com (CoolEdit > Audition)