'International Conference on Acoustics, Speech, and Signal ...The1995 International Conferenceon...
Transcript of 'International Conference on Acoustics, Speech, and Signal ...The1995 International Conferenceon...
-
The 1995 International Conference on
Acoustics, Speech, and Signal Processing
CONFERENCE
PROCEEDINGS
VOLUME 1:
Speech
Sponsored by The Signal Processing Society of
The Institute of Electrical and Electronics Engineers
May 9-12, 1995 Westin Hotel Detroit, Michigan U.S.A.
95CH35732
-
Volume I
SPEECH
TABLE OF CONTENTS
CELP CODING
Chair: Jean-Pierre Adoul, University ofSherbrooke(CANADA)
4KBPS Improved Pitch Prediction CELP SpeechCoding with 20ms Frame 1Masahiro Serizawa, Kazunori Ozawa, NEC
Corporation (JAPAN)
A Low-Complexity Toll-Quality Variable Bit RateCoder for CDMA Cellular Systems SPeter Kroon, Michael Recchione, AT&T BellLaboratories (USA)
Toll Quality 16 kb/s CELP Speech Coding withVery Low Complexity 9
Juin-Hwey Chen, AT&TBell Laboratories (USA)
CELP Coding Using Trellis-Coded Vector
Quantization of the Excitation 13Andrei Popescu, Nicolas Moreau, Telecom Paris,Claude Lamblin, CNET/LAA/TSS/CMC(FRANCE)
Interpolating the History Improved Excitation
Coding for High Quality CELP Coding 17Per Hedelin, Thomas Eriksson, Chalmers UniversityofTechnology (SWEDEN)
Fast Stochastic Codebook Search Through the Useof Odd-Symmetric Crosscorrelation Basis Vectors 21
Cheung-Fat Chan, City University ofHong Kong(HONG KONG)
Improvements of Background Sound Codingin Linear Predictive Speech Coders 25
TorbjSrn Wigren, Anders BergstrGm, Susanne
Harrysson, Fredrik Jansson, Hans Nilsson,Ericsson Radio Systems AB (SWEDEN)
Improved CS-CELP Speech Coding in a NoisyEnvironment Using a Trained Sparse ConjugateCodebook 29
Akitoshi Kataoka, Sachiko Hosaka, NTT Human
Interface Labs; Jotaro Ikedo, NTT Wireless SystemsLaboratories; Takehiro Moriya, Shinja Hayashi, NTT
Human Interface Labs. (JAPAN)
CELP Coding Based on Mel-cepstral Analysis 33Kazuhito Koishida, Keiichi Tokuda, Takao Kobayashi,Satoshi Imai, Tokyo Institute of Technology (JAPAN)
An Embedded Scheme for Regular Pulse Excited
(RPE) Linear Predictive Coding 37Shude Zhang, Gordon Lockhart, University ofLeeds (UK)
RECOGNITION: LARGE VOCABULARY
Chair: Michael Picheny, IBM (USA)
Performance of the IBM Large Vocabulary ContinuousSpeech Recognition System on the ARPA WallStreet Journal Task 41
L. R. Bahl, S. Balakrishnan-Aiyer, J.R. Bellegarda,M. Franz, P.S. Gopalakrisnan, D. Nahamoo, M. Novak,M. Padmanabhan, M.A. Picheny, S. Roukos, IBM (USA)
New Developments in the Lincoln Stack-DecoderBased Large Vocabulary CSR System 45Douglas B. Paul, MIT Lincoln Laboratory (USA)
Large Vocabulary Continuous Speech RecognitionUsing Word Graphs 49Xavier Aubert, Philips GmbHResearch Laboratories -
Aachen, Hermann Ney,,4 achen University ofTechnology(GERMANY)
Reducing Word Error Rate on Conversational
Speech from the Switchboard Corpus 53P. Jeanrenaud, E. Eide, U. Chaudhari, J. McDonoughK. Ng, M. Shi, H. Gish, BBN Systems and Technologies(USA)
Golden Mandarin (Hl)--A User-Adaptive Prosodic -
Segment-Based Mandarin Dictation Machine forChinese Language with Very Large Vocabulary 57Ren-Yuan Lyu, National Taiwan University; Lee-FengChien, Academia Sinica; Shiao-Hong Hwang, Hung-YunHsieh, Rung-Chiuan Yang, Bo-Ren Bai, Jia-Chi Weng,Yen-Ju Yang, Shi-Wei Lin, National Taiwan University;Keh-Jiann Chen, Chiu-Yu Tseng, Lin-Shan Lee, AcademiaSinica (REPUBLIC OF CHINA)
-
Complete Recognition of Continuous Mandarin
Speech for Chinese Language with Very Large
Vocabulary but Limited Training Data 61
Hsin-min Wang, Jia-lin Shen, Yen-Ju Yang, National
Taiwan University; Chiu-yu Tseng, Academia Sinica,Lin-shan Lee, National Taiwan University (REPUBLICOF CHINA)
Developments in Continuous Speech Dictation
Using the ARPA WSJ Task 65
J.L. Gauvain, L. Lamel, M. Adda-Decker, LIMSI-CNRS
(FRANCE)
Recent Improvements to the ABBOT LargeVocabulary CSR System 69
M.M. Hochberg, Cambridge University; S.J. Renals,
University ofSheffield; A.J. Robinson, G. D. Cook,
Cambridge University (ENGLAND)
The 1994 HTK Large Vocabulary SpeechRecognition System 73P. C. Woodland, C. J. Leggetter, J. J. Odell, V.
Valtchev, S.J. Young, Cambridge University (UK)
Tangerine: A Large Vocabulary Mandarin
Dictation System 77
Yuqing Gao, Hsiao-Wuen Hon, Zhiwei Lin, Gareth
Loudon, S. Yogananthan, Baosheng Yuan, National
University ofSingapore (SINGAPORE)
ASR SYSTEM & CORPORA
Chair: John Bridle, Diagem (USA)
WSJCAMO: A British English Speech Corpus for
Large Vocabulary Continuous Speech Recognition 81
Tony Robinson, Jeroen Fransen, David Pye, Jonathan
Foote, Steve Renals, Cambridge University (UK)
Voice Across Hispanic America: A TelephoneSpeech Corpus of American Spanish 85Yeshwant Muthusamy, Edward Holliman, Barbara
Wheatley, Texas Instruments; Joseph Picone,Mississippi State University; John Godfrey, UniversityofPennsylvania (USA)
Implementation of the POW (Phonetically OptimizedWords) Algorithm for Speech Database 89
Yeonja Lim, Youngjik Lee, ETR1 (KOREA)
Microsoft Windows Highly Intelligent SpeechRecognizer: Whisper 93
Xuedong Huang, Alex Acero, Fil Alleva, Mei-Yuh
Hwang, Li Jiang, Milind Mahajan, MicrosoftCorporation (USA)
Concept-Based Speech Translation 97L. Mayfield, M. Gavalda, W. Ward, A. Waibel,
Carnegie Mellon University (USA)
PhoneBook: A Phonetically-Rich Isolated-Word
Telephone-Speech Database 101John F. Pitrelli, Cynthia Fong, Suk H. Wong, Judith R.
Spitz, Hong C. Leung, NYNEXScience & Technology,Inc. (USA)
CTIMIT: A Speech Corpus for the CellularEnvironment with Applications to Automatic
Speech Recognition 105
Kathy L. Brown, E. Bryan George, Lockheed Sanders,Inc. (USA)
Toward Movement-Invariant Automatic Lip-
Reading and Speech Recognition 109Paul Duchnowski, University ofKarlsruhe
(GERMANY); Martin Hunke, Carnegie Mellon
University (USA); Dietrich Btlsching, Uwe Meier,
University ofKarlsruhe (GERMANY); Alex Waibel,
Carnegie Mellon University (USA)
Some Results with a Trainable Speech Translation
and Understanding System 113
V.M. Jimenez, A. Castellanos, E. Vidal, Universidad
Politecnica de Valencia (SPAIN)
A Continuous Speech Recognition System UsingFinite State Network and Viterbi Beam Search for
the Automatic Interpretation 117
Nam-Yong Han, Hoi-Rin Kim, Kyu-Woong Hwang,
Young-Mok Ahn, Joon-Hyung Ryoo, £77?/ (KOREA)
ROBUST SPEECH RECOGNITION
Chair: Richard Stern, Carnegie Mellon University (USA)
Robust Speech Recognition Based on Stochastic
Matching 121Ananth Sankar, SRI International; Chin-Hui Lee, AT&T
Bell Laboratories (USA)
On the Robustness of Linear Discriminant Analysis asa Preprocessing Step for Noisy Speech Recognition 125Olivier Siohan, CRIN-CNRS & INRIA - Lorraine (FRANCE)
A Maximum Likelihood Procedure for a Universal
Adaptation Method Based on HMM Composition 129Yasuhiro Minami, Sadaoki Furui, NTT Human InterfaceLaboratories (JAPAN)
A Fast and Flexible Implementation of ParallelModel Combination 133
M. J. F. Gales, S. J. Young, Cambridge University (U.K.)
-
Multivariate-Gaussian-BasedCepstralNormalization for Robust Speech Recognition 137Pedro J. Moreno, Bhiksha Raj, Evandro Gouvea,Richard M. Stern, Carnegie Mellon University (USA)
Robust Speech Recognition in Noise UsingAdaptation and Mapping Techniques 141Leonardo Neumeyer, Mitchel Weintraub, SRIInternational (USA)
Noisy Speech Recognition Using Robust Inversionof Hidden Markov Models 145
Seokyong Moon, Jenq-Neng Hwang, University ofWashington (USA)
Rapid Environment Adaptation for Robust SpeechRecognition 149Keizaburo Takagi, Hiroaki Hattori, Takao Watanabe,NEC Corporation (JAPAN)
Noise Estimation Techniques for Robust Speech
Recognition 153
H.G. Hirsch, C. Erlicher, Aachen University ofTechnology (GERMANY)
Pole-Filtered Cepstral Mean Subtraction 157
Devang Naik, Rutgers University (USA)
Discourse Structure for Multi-Speaker SpontaneousSpoken Dialogs: Incorporating Heuristics intoStochastic RTNs 177
Sheryl R. Young, Carnegie Mellon University (USA)
Improved Backing-Off for M-Gram LanguageModeling 181Reinhard Kneser, Philips GmbH Research Laboratories,Hermann Ney, RWTHAachen, University ofTechnology(GERMANY)
QWI: A Method for Improved Smoothing in
Language ModellingG. Bordel, I. Torres, Universidad del Pais Vasco; E.
Vidal, Universidad Politecnica de Valencia (SPAIN)
185
Using a Stochastic Context-Free Grammar as a
Language Model for Speech Recognition 189
Daniel Jurafsky, University ofCalifornia at Berkeley;Chuck Wooters, Department ofDefense; Jonathan Segal,
University ofCalifornia atBerkeley; Andreas Stolcke,SRI International; Eric Fosler, University ofCalifornia at
Berkeley; Gary Tajchman, Voice Processing Corporation;Nelson Morgan, University ofCalifornia at Berkeley (USA)
Improved Language Modeling by UnsupervisedAcquisition of Structure 193
Klaus Ries, Finn Dag Bu0, University ofKarlsruhe
(GERMANY); Ye-Yi Wang, Carnegie Mellon University
(USA); Alex Waibel, University ofKarlsruhe (GERMANY)
LANGUAGE MODELING
Chair: Roni Rosenfeld, Carnegie Mellon University (USA)
Language Model Adaptation via Minimum
Discrimination Information 161
P. Srinivasa Rao, Michael D. Monkowski, Salim
Roukos, IBM T J. Watson Research Center (USA)
Clustering Word Category Based on Binomial
posteriori Co-occurence Distribution 165
Masafumi Tamoto, Takeshi Kawabata, NTT Basic
Research Labs (JAPAN)
Language Modeling by Variable Length Sequences:Theoretical Formulation and Evaluation of
Multigrams 169
Sabine Deligne, FredeYic Bimbot, Telecom Paris
(FRANCE)
An Integrated Grammar/Bigram LanguageModel Using Path Scores
Harvey Lloyd-Thomas, Jerry H. Wright, EnsigmaLimited; Gareth J.F. Jones, University ofCambridge
(UK)
173
Understanding Referring Expressions in a Person-
Machine Spoken Dialogue 197
Claudia Pateras, Gregory Dudek, Renato De Mori,McGill University (CANADA)
USE OF KNOWLEDGE IN ASR
Chair: Jim Glass, Massachusetts Institute ofTechnology,LCS (USA)
Analysis of Acoustic-Phonetic Variations in Fluent
Speech Using TIMIT 201
Don X. Sun, State University ofNew York (USA), Li
Deng, University ofWaterloo (CANADA)
Analyzing Weaknesses of Language Models for
Speech Recognition 205
Joerg P. Ueberla, DRA Malvern (UK)
A Hidden Markov Model with Optimized Inter-
Frame Dependence 209
F. J. Smith, J. Ming, P. O'Boyle, A.D. Irvine, The
Queen's University (N. IRELAND)
-
On the Use of Scaler Quantization for Fast HMM
Computation 213
Shigeki Sagayama, Satoshi Takahashi, NTTHuman
Interface Laboratories (JAPAN)
Large-vocabulary Speech Recognition in SpecializedDomains 217
Haakon Chevalier, Chuck Ingold, Carol Kunz, ChipMoore, Crispen Roven, Jon Yamron, Bradley Baker,Paul Bamberg, Sarah Bridle, Tracy Bruce, Amy Weader,
Dragon Systems, Inc. (USA)
Understanding and Improving Speech RecognitionPerformance through the Use of Diagnostic Tools 221
Ellen Eide, Herbert Gish, Philippe Jeanrenaud, Angela
Mielke, BBN Systems and Technologies (USA)
Phrase Bigrams for Continuous Speech Recognition 225
Egidio P. Giachin, CSELT (ITALY)
Using Explicit Segmentation to Improve HMMPhone Recognition 229Carl D. Mitchell, Mary P. Harper, Leah H. Jamieson,Purdue University (USA)
Viterbi Algorithm for Acoustic Vectors Generated
by a Linear Stochastic Differential Equation onEach State 233
Marco Saerens, Universite Libre de Bruxelles,
(BELGIUM)
Non Deterministic Stochastic Language Models
for Speech Recognition 237G. Riccardi, E. Bocchieri, R. Pieraccini, AT&T BellLaboratories (USA)
TOPICS IN SPEECH CODING
Chair: Peter Kroon, AT&TBell Laboratories (USA)
Improving 16 kb/s G.728 LD-CELP Speech Coderfor Frame Erasure Channels 241
Craig R. Watkins, Juin-Hwey Chen, AT&T BellLaboratories (USA)
Reconstruction of Missing Packets for CELP-Based
Speech Coders 245Aamir Husain, Vladimir Cuperman, Simon Fraser
University (CANADA)
A Robust Variable-Rate Speech Coder 249A. Shen, B. Tang, A. Alwan, G. Pottie, University ofCalifornia - Los Angeles(USA)
Wideband Speech Coding Using MultipleCodebooks and Glottal Pulses 253
C. McElroy, B.P. Murray, A.D. Fagan, UniversityCollege - Dublin (IRELAND)
Speech Coding Using ISI Coded Quantization 257
Nam Phamdo, SUNY; Cheng-Chieh Lee, University
ofMaryland; Rajiv Laroia, AT&T Bell Laboratories
(USA)
New Techniques for Multi-prototype Waveform
Coding at 2.84 kb/s 261I.S. Burnett, G.J. Bradley, University of Wollongong(AUSTRALIA)
Quantization of Non-Linear Predictors in SpeechCoding 265Jes Thyssen, Henrik Nielsen, Tele Danmark Research,Steffen Duus Hansen, Technical University ofDenmark
(DENMARK)
A Fast Robust Stochastic Algorithm for Vector
Quantizer Design for Nonstationary Channels 269
B. K5vesi, S. Saoudi, J.M. Boucher, ENST/Bretagne,(FRANCE); Z. Reguly, Technical University ofBudapest (HUNGARY)
Voice Quality of Interconnected PCS, JapaneseCellular, and Public Switched Telephone Networks 273
Spiros Dimolitsas, Franklin L. Corcoran, Channasandra
Ravishanker, COMSAT Laboratories; Marion Baraniecki,INTELSAT (USA)
Objective Speech Measure for Chinese in WirelessEnvironment 277
K.H. Lam, O.C. Au, C.C. Chan, K.F. Hui, S.F. Lau,
Hong Kong University ofScience & Technology(HONG KONG)
WORDSPOTTING, REJECTION, AND
TOPIC IDENTIFICATION
Chair: Jay Wilpon, AT&T Bell Laboratories (USA)
A Training Procedure for Verifying StringHypotheses in Continuous Speech Recognition 281R.C. Rose, B.H. Juang, C.H. Lee, AT&T BellLaboratories (USA)
Robust Utterance Verification for Connected
Digits Recognition 285Mazin G. Rahim, Chin-Hui Lee, Biing-llwang Juang,A T&T Bell Laboratories (USA)
-
A Hybrid Wordspotting Method for SpontaneousSpeech Understanding Using Word-Based Pattern
Matching and Phoneme-BasedHMM 289
Hiroshi Kanazawa, Mitsuyoshi Tachimori, Yoichi
Takebayashi, Toshiba Corporation (JAPAN)
Acoustic and Language Modeling ofHuman and
Nonhuman Noises for Human-to-Human
Spontaneous Speech Recognition 293
T. Schultz, I. Rogina, University ofKarlsruhe
(GERMANY) and Carnegie Mellon University (USA)
LVCSR Log-Likelihood Ratio Scoring for Keyword
Spotting 297
Mitchel Weintraub, SRI International (USA)
Keyword Spotting Using Supervised/Unsupervised
Competitive Learning 301
Chakib Tadj, Franck Poirier, Telecom Paris (FRANCE)
A Continuous Density Neural Tree Network
Word Spotting System 305
Stephen V. Kosonocky, IBM T.J. Watson Research
Center; Richard J. Mammone, Rutgers University (USA)
Video Mail Retrieval: The Effect of Word Spotting
Accuracy on Precision 309
G.J.F. Jones, J.T. Foote, K. Sparck Jones, S.J. Young,Cambridge University (UK)
Improved Topic Spotting through Statistical
Modelling of Keyword DependenciesJerry H. Wright, Michael J. Carey, Eluned S. Parris,
Ensigma Limited (UK)
313
Topic Focusing Mechanism for Speech RecognitionBased on Probabilistic Grammar and Topic
Markov Model 317
Takeshi Kawabata, NTT Basic Research Labs (JAPAN)
The Effects of Telephone Transmission Degradationson Speaker Recognition Performance 329D.A. Reynolds, M.A. Zissman, T.F. Quatieri, G.C.
O'Leary, B.A. Carlson, MITLincoln Laboratory (USA)
Covariance Estimation Methods for Channel
Robust Text-Independent Speaker Identification 333Michael Schmidt, Herbert Gish, Angela Mielke, BBN
Systems and Technologies (USA)
Channel and Noise Compensation for Text
Dependent Speaker Verification Over Telephone 337William Y. Huang, ITTAerospace Communications;Bhaskar D. Rao, University ofCalifornia (USA)
Testing with the Yoho CD-ROM Voice Verification
Corpus 341
Joseph P. Campbell, Jr., U.S. Department ofDefense(USA)
An Orthogonal Polynomial Representation of Speech
Signals and Its Probabilistic Model for Text
Independent Speaker Verification 345
Chi-Shi Liu, Ministry ofTransportation and
Communications; Hsiao-Chaun Wang, National Tsing
Hua University (TAIWAN); Frank K. Soong, AT&T
Bell Laboratories (USA); Chao-Shih Huang, Ministry
of Transportation and Communications (TAIWAN)
Text-Dependent Speaker Verification Using Data
Fusion 349
Kevin R. Farrell, Dictaphone Corporation (USA)
Neural Net Approaches to Speaker Verification:
Comparison with Second Order Statistic Measures
M. Mehdi Homayounpour, CNRS/URA, (FRANCE);
Gerard Chollet, IDIAP (SWITZERLAND)
353
A Subword Neural Tree Network Approach to Text-
Dependent Speaker Verification 357
Han-Sheng Liou, Richard J. Mammone, Rutgers
University (USA)
SPEAKER RECOGNITION
Chair: S. ?nrthasmthy, AT&TBell Laboratories (USA)
The Influence of Noise on the Speaker Recognition
Performance Using the Higher Frequency Band 321
Shoji Hayakawa, Fumitada Itakura, Nagoya University
(JAPAN)
Measuring Fine Structure in Speech: Application
to Speaker Identification
C.R. Jankowski, Jr., T.F. Quatieri, D.A. Reynolds,
MIT Lincoln Laboratory (USA)
325
RECOGNITION: FEATURE ANALYSIS
Chair: Shigeki Sagayama, NTT (JAPAN)
Statistical Modeling of Speech Feature Vector
Trajectories Based on a Piecewise Continuous
Mean Path 361
Mark M. Thomson, University ofAuckland (NEW
ZEALAND)
-
Trace-Segmentation of Isolated Utterances for
Speech Recognition 365Euvaldo F. Cabral, Jr., University ofSao Paulo,
(BRAZIL); Graham D. Tattersall, University ofEast
Anglia (UK)
Optimal Linear Feature Transformations for
Semi-Continuous Hidden Markov Models 369
E. Gtinter Schukat-Talamazzini, Joachim Hornegger,Heinrich Niemann, Universitdt Erlangen-Nurnberg(GERMANY)
Use of Generalized Dynamic Feature Parametersfor Speech Recognition: Maximum Likelihoodand Minimum Classification Error Approaches 373
C. Rathinavelu, L. Deng, University of Waterloo
(CANADA)
A Statistical Pattern Recognition Approach toRobust Recursive Identification of Non-stationaryAR Model of Speech Production System 377Milan Z,. Markovic, Institute ofApplied Math andElectronics, Branko D. Kovacevic, University ofBelgrade, Milan M. Milosavljevic, Institute ofAppliedMath andElectronics (YUGOSLA VIA)
The NP Speech Activity Detection AlgorithmJoseph Pencak, Douglas Nelson, Department of
Defense (USA)
Speech Analysis Based on Malvar WaveletTransform
Christophe Ris, Vincent Fontaine, Henri Leich,Faculte Polytechnique de Mons (BELGIUM)
Magnitude Spectral Estimation via PoissonMoments with Application to Speech RecognitionSamel Celebi, Jose C. Principe, University ofFlorida(USA)
381
Improved Speech Modeling and Recognition UsingMulti-dimensional Articulatory States as Primitive
Speech Units 385L. Deng, J. Wu, H. Sameti, University ofWaterloo(CANADA)
389
393
397Stochastic Perceptual Models of SpeechNelson Morgan, University ofCalifornia at Berkeley,(USA); Hervd Bourlard, Faculte Polytechnique(BELGIUM); Steven Greenberg, University ofCalifornia atBerkeley; Hynek Hermansky, Oregon Graduate Institute,Su-Lin Wu, University ofCalifornia at Berkeley (USA)
TOPICS IN NOISE AND RECOGNITION
Chair: Yariv Ephraim, George Mason University (USA)
Auditory Scene Analysis and Hidden MarkovModel Recognition of Speech in Noise 401P.D. Green, M.P. Cooke, M.D. Crawford, UniversityofSheffield (UK)
Speech Enhancement Based on Temporal
Processing 405Hynek Hermansky, Eric A. Wan, Carlos Avendano,Oregon Graduate Institute ofScience & Technology (USA)
A Comparative Study of Mel Cepstra and EIH forPhone Classification under Adverse Conditions 409
Sumeet Sandhu, Oded Ghitza, AT&T Bell Laboratories
(USA)
Supplementary Orthogonol Cepstral Features 413Khaled T. Assaleh, Motorola, GSTG (USA)
Subband Analysis for Robust Speech Recognitionin the Presence of Car Noise 417
Engin Erzin, Bilkent University; A. Enis Cetin, Kog
University; Yasemin Yardimci, Bogazici University(TURKEY)
Robust Speech Feature Extraction Using SBCOR
Analysis 421
Shoji Kajita, Fumitada Itakura, Nagoya University(JAPAN)
Methods for Improved Speech Recognition Over
Telephone Lines 425Alfred Hauenstein, Erwin Marschall, Siemens AG
(GERMANY)
New HOS-Based Parameter Estimation Methods
for Speech Recognition in Noisy Environments 429Asunci6n Moreno, Sergio Tortola, Josep Vidal,Jose* A. R. Fonollosa, Universitat Politecnica de
Catalunya (SPAIN)
Noise Compensation for Speech Recognition in CarNoise Environments 433
Ruikang Yang, Petri Haavisto, Nokia Research Center
(FINLAND)
Speech Recognition in Impulsive Noise 437S.V. Vaseghi, B.P. Milner, University of East Anvjia(UK)
-
RECOGNITION: TRAINING TECHNIQUES SPEECH CODING BELOW 4 KB/S
Chair: Robin Rohlicek, BBN, Inc. (USA)
Speaker-independent Phone Modeling Based onSpeaker-dependent HMMs' Composition and
Clustering 441Tetsuo Kosaka, Shoichi Matsunaga, A 77? InterpretingTelecommunications Research Laboratories; Mikio
Kuraoka, Toyohashi University ofTechnology (JAPAN)
Using Morphology Towards Better Large-Vocabulary Speech Recognition Systems 445P. Geutner, Universitat Karlsruhe (GERMANY)
Optimal Splitting of HMM Gaussian Mixture
Components with MMIE Training 449Yves Normandin, Centre de Recherche informatiquede Montreal (CANADA)
Dictionary Learning: Performance through
Consistency 453Tilo Sloboda, Universitat Karlsruhe (GERMANY)
Incremental MAP Estimation ofHMMs for
Efficient Training and Improved Performance 457Yoshihoko Gotoh, Brown University (USA); MichaelM. Hochberg, Cambridge University (UK); Daniel J.
Mashao, Harvey F. Silverman, Brown University(USA)
Discrete MMI Probability Models for HMM
Speech Recognition 461
J.T. Foote, Cambridge University (UK)
Global Discrimination for Neural Predictive
Systems Based on N-Best Algorithm 465
Abdelhamid Mellouk, LRI, UA 410 CNRS; Patrick
Gallinari, LAFORIA, UA CNRS 1095 (FRANCE)
Enhancement of Discriminative Capabilities ofHMM Based Recognizer through Modification
of Viterbi Algorithm 469
Jianming Song, The University ofWoolongong(AUSTRALIA)
A Generalization of the Baum Algorithm to
Functions on Non-linear Manifolds 473
D. Kanevsky, IBM T.J. Watson Research Center (USA)
Data-Driven Codebook Adaptation in Phonetically
Tied SCHMMs
Thomas Kemp, Universitat Karlsruhe (GERMANY)
All
Chair: Thomas E. Tremain, U.S. Department ofDefense(USA)
NATO STANAG 4479: A Standard for an 800 bpsVocoder and Channel Coding in HF-ECCM System 480B. Mouy, P. de La Noue, G. Goudezeune, ThomsonCSF-RGS (FRANCE)
Harmonic and Noise Coding of LPC Residuals withClassified Vector Quantization 484
Masayuki Nishiguchi, Jun Matsumoto, Sony Corporation(JAPAN)
Progress Towards a New Government Standard2400 bps Voice Coder 488M.A. Kohler, L.M. Supplee, T.E. Tremain, U.S.
Department ofDefense (USA)
Variable Dimension Spectral Coding of Speech at2400 bps and Below with Phonetic Classification 492Amitava Das, Allen Gersho, University ofCalifornia -Santa Barbara (USA)
Spectral Excitation Coding of Speech At 2.4 kb/s 496V. Cuperman, P. Lupini, B. Bhattacharya, SimonFraser University (CANADA)
A Robust 2400 bps Subband LPC Vocoder 500
P. A. Laurent, P. de La Noue, Thomson CSF-RGS
(FRANCE)
Band-Widened Harmonic Vocoder at 2 to 4 kbps 504
Gao Yang, G. Zanellato, Lernout & Hauspie SpeechProducts; H. Leich, Faculte Polytechnique de Mons
(BELGIUM)
A Speech Coder Based on Decomposition ofCharacteristic Waveforms 508
W. Bastiaan Kleijn, Jesper Haagen, AT&TBellLaboratories (USA)
Speech Compression Using Pitch SynchronousInterpolation 512
R. Taori, R.J. Sluijter, E. Kathmann, Philips Research
Laboratories (THE NETHERLANDS)
Pitch-Synchronous Multi-Band (PSMB) Speech
Coding 516
Haiyun Yang, Soo-Ngee Koh, Pratab Sivaprakasapillai,
Nanyang Technological University (SINGAPORE)
-
RECOGNITION: MODELING
STRUCTURES
RECOGNITION: SEARCH TECHNIQUES
Chair: Hsiao-Wuen Hon, Apple ISS Research Centre,
University ofSingapore (SINGAPORE)
Four-Level Tied-Structure for Efficient
Representation of Acoustic Modeling 520Satoshi Takahashi, Shigeki Sagayama, NTT Human
Interface Laboratories (JAPAN)
Application of Clustering Techniques to Mixture
Density Modelling for Continuous-SpeechRecognition 524Christian Dugast, Peter Beyerlein, Reinhold Haeb-
Umbach, Philips Research Laboratories (GERMANY)
Context Dependent Phonetic Duration Models for
Decoding Conversational Speech 528Michael D. Monkowski, Michael A. Picheny, P.Srinivasa Rao, IBM- TJ Watson Research Center
(USA)
A Unified Way in Incorporating Segmental Featureand Segmental Model into HMM 532Jun He, Henri Leich, Faculte Polytechnique de Mons
(BELGIUM)
Experimental Evaluation of Segmental HMMs 536
Wendy J. Holmes, Martin J. Russell, DRA Malvern(UK)
Improved Acoustic Modeling for SpeechRecognition Using 2D Markov Random Fields 540Helmut Lucke, ATR/ITL (JAPAN)
Structured Markov Models for Speech Recognition 544F. Wolfertstetter, G. Ruske, Munich University ofTechnology (GERMANY)
Robust Parametric Modeling of Durations inHidden Markov Models 548David Burshtein, Tel-Aviv University (ISRAEL)
Improved Decision Trees for Phonetic Modeling 552Roland Kuhn, Ariane Lazarides, Yves Normandin,Julie Brousseau, CRIM (CANADA)
High Speed Speech Recognition Using Tree-Structured Probability Density Function 556Takao Watanabe, Koichi Shinoda, Keizaburo Takagi,Ken-ichi Iso, NEC Corporation (JAPAN)
Chair: Al Alewa, Microsoft Corporation (USA)
A Fast Segmental Viterbi Algorithm for LargeVocabulary Recognition 560P. Laface, C. Vair, Politechnico di Torino, L. Fissore,CSELT (ITALY)
Searching with a Transcription Graph 564Z. Li, P. Kenny, D. O'Shaughnessy, Universite du
Quebec (CANADA)
On the Use of Stochastic Inference Networks for
Representing Multiple Word Pronunciations 568Renato De Mori, Charles Snow, Michael Galler,McGill University School ofComputer Science(CANADA)
A Tree Search Strategy for Large-VocabularyContinuous Speech Recognition 572P.S. Gopalakrishnan, L.R. Bahl, IBM; R.L. Mercer,Renaissance Technologies (USA)Lattice-Based Search Strategies for LargeVocabulary Speech Recognition 576F. Richardson, M. Ostendorf, Boston University; J.R.
Rohlicek, Bolt Beranek & Newman, Inc. (USA)
On Using a priori Segmentation of the SpeechSignal in an N-Best Solutions Post-processing 580T. Moudenc, D. Jouvet, J. Monne\ France Telecom
(FRANCE)
Time-Synchronous Continuous Speech RecognizerDriven by a Context-Free Grammar 584Tohru Shimizu, ATR/ITL; Seikou Monzen, YamagataUniversity; Harald Singer, Shoichi Matsunaga, ATR/ITL
(JAPAN)
Language Model Representations for Beam-SearchDecoding 588Giuliano Antoniol, Fabio Brugnara, Mauro Cettolo,Marcello Federico, I.R.S.T. (ITALY)
A Lower-Complexity Viterbi Algorithm 592Sarvar Patel, Bellcore (USA)
Efficient Search Using Posterior Phone ProbabilityEstimates 596
Steve Renals, University ofSheffield, Mike Hochberg,University of Cambridge (UK)
-
PROSODY FOR SYNTHESIS &
RECOGNITIONSPEECH SYNTHESIS & PRODUCTION
Chair: Yoshinori Sagisaka, AT&TBell Laboratories (USA)
Timing Patterns in Fluent and Disfluent SpontaneousSpeech 600Douglas O'Shaughnessy, Universite du Quebec (CANADA)
Stochastic Modeling of Pause Insertion UsingContext- Free Grammar 604
Shigeru Fujio, Yoshinori Sagisaka, Norio Higuchi,ATR-ITL (JAPAN)
Automatic Classification of Pitch Movements via
MLP-Based Estimation of Class Probabilities 608
Louis F. M. ten Bosch, Institute for PerceptionResearch, (THE NETHERLANDS)
On the Effects of Speech Rate in Large Vocabulary
Speech Recognition Systems 612Matthew A. Siegler, Richard M. Stern, CarnegieMellon University (USA)
A Prosodic Model of Mandarin Speech and Its
Application to Pitch Level Generation for Text
-to-Speech 616
Shaw-Hwa Hwang, Sin-Homg Chen, National Chiao
Tung University (REPUBLIC OF CHINA)
Prosodic Cues to Word UsageKaren Ward, David G. Novick, Oregon Graduate
Institute ofScience & Technology(USA)
620
Automatic Prosodic Segmentation by F0 Clustering
Using Superpositional Modeling 624Mitsuru Nakai, Tohoku University, Harald Singer,Yoshinori Sagisaka, ATR-ITL; Hiroshi Shimodaira,JAIST (JAPAN)
Duration Modeling in Large Vocabulary Speech
Recognition 628
Anastasios Anastasakos, Northeastern University,Richard Schwartz, Han Shu, BBNSystems and
Technologies (USA)
Speaker-Independent Automatic Classification of
Thai Tones in Connected Speech by Analysis-
Synthesis Method 632
Siripong Potisuk, Mary P. Harper, Jackson T. Gandour,
Purdue University (USA)
Chair: Kathleen Cummings, Georgia Institute ofTechnology (USA)
Speech Synthesis System Based on a Variable
Decimation/Interpolation Factor 636F. M. Gimenezde los Galanes, Univerisity Politecnicade Madrid; M. H. Savoji, Universidad de Cantabria;J. M. Pardo, Univerisity Politecnica de Madrid (SPAIN)
Automatic Speech Synthesiser Parameter Estimation
Using HMMs 640R.E. Donovan, P.C. Woodland, Cambridge University(UK)
Speaker Modification with LPC Pole Analysis 644Janet Slifka, Systems Research Laboratories, TimothyR. Anderson, Armstrong Laboratory (USA)
Synthesizing Styled Speech Using the Klatt
Synthesizer 648Janet C. Rutledge, Northwestern University; KathleenE. Cummings, Daniel A. Lambert, Mark A. Clements,
GeorgiaInstitute ofTechnology (USA)
Acoustical Measurements of the Vocal-Tract Area
Function: Sensitivity Analysis and Experiments 652
Hani Yehia, Nagoya University; Masaaki Honda, NTT
Basic Research Laboratories; Fumitada Itakura, Nagoya
University (JAPAN)
Shape-Invariant Pitch-Synchronous Text-to-
Speech Conversion 656
Eduardo R. Banga, Carmen Garcia-Mateo, Universidad
deVigo, (SPAIN)
Speech Parameter Generation from HMM Using
Dynamic FeaturesKeiichi Tokuda, Takao Kobayashi, Satoshi Imai,
Tokyo Institute ofTechnology (JAPAN)
660
A Source Generator Based Modeling Framework for
Synthesis of Speech Under Stress 664
Sahar E. Bou-Ghazale, John H. L. Hansen, Duke
University (USA)
MBE Synthesis of Speech Coded in LPC Format
K.F. Lam, C.F. Chan, City Polytechnic ofHong Kong
(HONGKONG)
668
Modeling Speech Production Using Yee's Finite
Difference Method 672
Kathleen E. Cummings, Georgia Institute ofTechnology;James G. Maloney, Georgia Technical Research Institute;
Mark A. Clements, Georgia Institute ofTechnology (USA)
-
SPEAKER ADAPTATION SPECTRAL QUANTIZATION
Chair: C.H. Lee, AT&TBell Laboratories (USA)
Batch, Incremental and Instantaneous AdaptationTechniques for Speech Recognition 676
G. Zavaliagkos, Northeastern University; R. Schwartz,J. Makhoul, BBN Systems and Technologies (USA)
Speaker Adaptation Using Combined Transformation
and Bayesian Methods 680
Vassilios Digalakis, Leonardo Neumeyer, SRIInternational (USA)
Rapid Speaker Adaptation Using Model Prediction 684S. M. Ahadi, P. C. Woodland, Cambridge University(UK)
Speaker Adaptation Based on Transfer Vector Field
Smoothing Using Maximum a posteriori ProbabilityEstimation 688
Masahiro Tonomura, Tetsuo Kosaka, Shoichi
Matsunaga, A TR Interpreting TelecommunicationsResearch Labs (JAPAN)
Experiments Using Data Augmentation for SpeakerAdaptation 692Jerome R. Bellegarda, Apple Computer Inc.; Peter V.de Souza, David Nahamoo, Mukund Padmanabhan,Michael A. Picheny, Lalit R. Bahl, IBM (USA)
Vector-Field-Smoothed Bayesian Learning forIncremental Speaker Adaptation 696Jun-ichi Takahashi, Shigeki Sagayama, NTTHumanInterface Laboratories (JAPAN)
A Speaker Adaptation Technique Using Linear
Regression 700S.J. Cox, University ofEast Anglia (UK)
Speaker Adaptation Based on Spectral Normalizationand Dynamic HMM Parameter Adaptation 704
Ming-Whei Feng, GTE Laboratories, Inc. (USA)
On-line Bayes Adaptation of SCHMM Parametersfor Speech Recognition 708Qiang Huo, Chorkin Chan, University ofHong Kong(HONGKONG)
Iterative Self-Learning Speaker and ChannelAdaptation under Various Initial ConditionsYunxin Zhao, University ofIllinois at Urbana-Champaign (USA)
712
Chair: Costas Xydeas, University ofManchester (UK)
Fast and Low-Complexity LSF Quantization UsingAlgebraic Vector Quantizer 716
Minjie Xie, Jean-Pierre Adoul, University ofSherbrooke(CANADA)
Low Cost Vector Quantization Methods for Spectral
Coding in Low Rate Speech Coders 720H.R. Sadegh Mohammadi, W.H. Holmes, UniversityofNew South Wales (AUSTRALIA)
Matrix Product Quantization for Very-Low-RateSpeech CodingStefan Bruhn, Technical University ofBerlin
(GERMANY)
724
An Intrinsically Reliable and Fast Algorithm to
Compute the Line Spectrum Pairs (LSP) in Low BitRate CELP Coding 728A. Goalie, S. Saoudi, ENST-Bretagne (FRANCE)
Spectral Dynamics Is More Important Than
Spectral Distortion 732H. Petter Knagenhjelm, W. Bastiaan Kleijn, AT&TBell Laboratories (USA)
Efficient Quantization of LSF Parameters UsingClassified SVQ Combined with Conditional
Splitting 736Dong-il Chang, Young-kwon Cho, Souguil Ann,Seoul National University (KOREA)
Efficient Coding of LSP Parameters Using SplitMatrix Quantisation 740C.S. Xydeas, C. Papanastasiou, University ofManchester (UK)
How Good Is Your p? Observations on VQTraining Ratios 744John S. Collura, Thomas E. Tremain, U.S. DepartmentofDefense (USA)
Variable Rate Spectral Quantization for PhoneticallyClassified CELP Coding 748Roar Hagen, Chalmers University oj'Technology;Erdai Paksoy, Allen Gersho, Universitv ofCalifornia(USA)
Optimal Distortion Measures for the High RateVector Quantization of LPC ParametersWilliam R. Gardner, University ofCalifornia-SanDiego; Bhaskar D. Rao, Qual Comm, Inc. (USA)
752
-
SPEECH ANALYSIS
Chair: Paul Mermelstein, INRS-Telecom (FRANCE)
Harmonics Tracking and Pitch Extraction Based onInstantaneous Frequency 756Toshihiko Abe, Takao Kobayashi, Satoshi Imai, TokyoInstitute of Technology (JAPAN)
Decomposition of Speech Signals into Deterministicand Stochastic Components 760C. d'Alessandro, LMSI-CNRS (FRANCE); B.Yegnanarayana, Indian Institute ofTechnology(INDIA); V. Darsinos, University ofPatras (GREECE)
Modeling and Processing Speech with Sums of AM-FM Formant Models 764
Shan Lu, Peter C. Doerschuk, Purdue University(USA)
On the Statistical Properties of Line Spectrum Pairs 768J.S. Erkelens, P.M.T. Broersen, Delft University ofTechnology (THE NETHERLANDS)
Individual Variations in Glottal Characteristics of
Female Speakers 772Helen M. Hanson, Harvard University (USA)
A Robust Method for Determining Instants of MajorExcitations in Voiced Speech 776B. Yegnanarayana, Indian Institute of Technology(INDIA); R.L.H.M, Smits, Institutefor PerceptionResearch (THE NETHERLANDS)
Interpolation of LPC Spectra via Pole Shifting 780Vladimir Goncharoff, Maureen Kaine-Krolak,
University ofIllinois at Chicago (USA)
Speech Formant Frequency and Bandwidth
Tracking Using Multiband Energy Demodulation 784Alexandras Potamianos, Petros Maragos, GeorgiaInstitute ofTechnology (USA)
Nonlinear Prediction for Speech Coding UsingRadial Basis Functions 788
Fernando Diaz-de-Maria, Universidafde Cantabria;Anfbal R. Figueiras-Vidal, Universidad Politecnica
de Madrid (SPAIN)
Recognition of Unvoiced Stops from Their Time-
Frequency Representation 792
Maria Rangoussi, Anastasios Delopoulos, NationalTechnical University ofAthens (GREECE)
SPEECH ENHANCEMENT & NOISE
REDUCTION
Chair: John H.L. Hansen, Duke University (USA)
Speech Enhancement Based on Masking Propertiesofthe Auditory System 796Nathalie Virag, Swiss Federal Institute of Technology(SWITZERLAND)
Optimizing Speech Enhancement by ExploitingMasking Properties of the Human Ear 800A. Akbari Azirani, R. Le Bouquin Jeannes, G. Faucon,Universite de Rennes I (FRANCE)
A Spectrally-Based Signal Subspace Approach for
Speech Enhancement 804Yariv Ephraim, Harry L. VanTrees, George MasonUniversity (USA)
Real-Time Implementation of HMM-Based MMSE
Algorithm for Speech Enhancement in HearingAid Applications 808H. Sheikhzadeh, Univerisity of Waterloo; R.L.Brennan, Unitron Industries, Ltd.; H. Sameti, UniversityofWaterloo (CANADA)
New Methods for Adaptive Noise Suppression 812Levent Arslan, Alan McCree, Vishu Viswanathan,Texas Instruments (USA)
Single-Sensor Speech Enhancement Using aSoft-Decision/Variable Attenuation Algorithm 816E. Bryan George, Lockheed Sanders, Inc. (USA)
Speech Enhancement Using a Ternary-DecisionBased Filter 820
T.S. Sun, S. Nandkumar, J, Carmody, J. Rothweiler,A. Goldschen, N. Russell, S. Mpasi, P. Green,Martin Marietta Laboratories (USA)
Signal Modeling Enhancements for Automatic
Speech Recognition 824Zaki B. Nossair, Peter L. Silsbee, Stephen A. Zahorian,OldDominion University (USA)
Co-Channel Speaker Separation 828
David P. Morgan, E. Bryan George, Texas Instruments;
Leonard T. Lee, LockheedSanders, Inc.; Stephen M.
Kay, University ofRhode Island (USA)
Speech Enhancement Based on the Generalized DualExcitation Model with Adaptive Analysis Window 832
Chang D, Yoo, Jae S. Lim, Massachusetts Institute ofTechnology (USA)
-
SPECIAL TOPICS IN SPEECH
RECOGNITION
Chair: K. Paliwal, Griffith University
Foreign Accent Classification Using Source Generator
Based Prosodic Features 836
John H. L. Hansen, Levent M. Arslan, Duke University
(USA)
Automatic Transcription of Unknown Words in a
Speech Recognition System 840R. Haeb-Umbach, P. Beyerlein, E. Thelen, PhilipsResearch Laboratories-Aachen (GERMANY)
An Evaluation of an Adaptive Multichannel Systemfor Speech Enhancement with Automatic Phase
Alignment 844
Silvana L. do N. Cunha Costa, Benedito G. AguiarNeto, Universidade Federal da Paraiba (BRAZIL)
Knowing Who to Listen to in Speech Recognition:
Visually Guided Beamforming 848Udo Bub, Martin Hunke, Alex Waibel, CarnegieMellon University (USA)
An N-Best Strategy, Dynamic Grammars and
Selectively Trained Neural Networks for Real-Time
Recognition of Continuously Spelled Names Over
the Telephone 852
Jean-Claude Junqua, Stephane Valente, Speech
TechnologyLaboratory, (USA); Dominique Fohr,Jean-Francois Mari, CRIN/INRIA, (FRANCE)
Language Models for a Spelled Letter Recognizer 856Martin Betz, Hermann Hild, Universitat Karlsruhe
(GERMANY)
Hands Free Continuous Speech Recognition in NoisyEnvironment Using a Four Microphone Array 860D. Giuliani, M. Matassoni, M. Omologo, P. Svaizer,IRST (ITALY)
A New Method for Automatic Generation of
Speaker-Dependent Phonological Rules 864Toru Imai, Akio Ando, Eiichi Miyasaka, NHK Science& Technology Research Labs (JAPAN)
Enhancing Automatic Speech Recognition with
an Ultrasonic Lip Motion Detector 868David L. Jennings, Dennis W. Ruck, AFIT/ENG (USA)
Classification and Clustering of Stop Consonantsvia Nonparametric Transformations and Wavelets 872Basilis Gidas, Brown University; Alejandro Murua,University ofChicago (USA)