'International Conference on Acoustics, Speech, and Signal ...The1995 International Conferenceon...

The 1995 International Conference on

Acoustics, Speech, and Signal Processing

CONFERENCE

PROCEEDINGS

VOLUME 1:

Speech

Sponsored by The Signal Processing Society of

The Institute of Electrical and Electronics Engineers

May 9-12, 1995 Westin Hotel Detroit, Michigan U.S.A.

95CH35732

Volume I

SPEECH

TABLE OF CONTENTS

CELP CODING

Chair: Jean-Pierre Adoul, University ofSherbrooke(CANADA)

4KBPS Improved Pitch Prediction CELP SpeechCoding with 20ms Frame 1Masahiro Serizawa, Kazunori Ozawa, NEC

Corporation (JAPAN)

A Low-Complexity Toll-Quality Variable Bit RateCoder for CDMA Cellular Systems SPeter Kroon, Michael Recchione, AT&T BellLaboratories (USA)

Toll Quality 16 kb/s CELP Speech Coding withVery Low Complexity 9

Juin-Hwey Chen, AT&TBell Laboratories (USA)

CELP Coding Using Trellis-Coded Vector

Quantization of the Excitation 13Andrei Popescu, Nicolas Moreau, Telecom Paris,Claude Lamblin, CNET/LAA/TSS/CMC(FRANCE)

Interpolating the History Improved Excitation

Coding for High Quality CELP Coding 17Per Hedelin, Thomas Eriksson, Chalmers UniversityofTechnology (SWEDEN)

Fast Stochastic Codebook Search Through the Useof Odd-Symmetric Crosscorrelation Basis Vectors 21

Cheung-Fat Chan, City University ofHong Kong(HONG KONG)

Improvements of Background Sound Codingin Linear Predictive Speech Coders 25

TorbjSrn Wigren, Anders BergstrGm, Susanne

Harrysson, Fredrik Jansson, Hans Nilsson,Ericsson Radio Systems AB (SWEDEN)

Improved CS-CELP Speech Coding in a NoisyEnvironment Using a Trained Sparse ConjugateCodebook 29

Akitoshi Kataoka, Sachiko Hosaka, NTT Human

Interface Labs; Jotaro Ikedo, NTT Wireless SystemsLaboratories; Takehiro Moriya, Shinja Hayashi, NTT

Human Interface Labs. (JAPAN)

CELP Coding Based on Mel-cepstral Analysis 33Kazuhito Koishida, Keiichi Tokuda, Takao Kobayashi,Satoshi Imai, Tokyo Institute of Technology (JAPAN)

An Embedded Scheme for Regular Pulse Excited

(RPE) Linear Predictive Coding 37Shude Zhang, Gordon Lockhart, University ofLeeds (UK)

RECOGNITION: LARGE VOCABULARY

Chair: Michael Picheny, IBM (USA)

Performance of the IBM Large Vocabulary ContinuousSpeech Recognition System on the ARPA WallStreet Journal Task 41

L. R. Bahl, S. Balakrishnan-Aiyer, J.R. Bellegarda,M. Franz, P.S. Gopalakrisnan, D. Nahamoo, M. Novak,M. Padmanabhan, M.A. Picheny, S. Roukos, IBM (USA)

New Developments in the Lincoln Stack-DecoderBased Large Vocabulary CSR System 45Douglas B. Paul, MIT Lincoln Laboratory (USA)

Large Vocabulary Continuous Speech RecognitionUsing Word Graphs 49Xavier Aubert, Philips GmbHResearch Laboratories -

Aachen, Hermann Ney,,4 achen University ofTechnology(GERMANY)

Reducing Word Error Rate on Conversational

Speech from the Switchboard Corpus 53P. Jeanrenaud, E. Eide, U. Chaudhari, J. McDonoughK. Ng, M. Shi, H. Gish, BBN Systems and Technologies(USA)

Golden Mandarin (Hl)--A User-Adaptive Prosodic -

Segment-Based Mandarin Dictation Machine forChinese Language with Very Large Vocabulary 57Ren-Yuan Lyu, National Taiwan University; Lee-FengChien, Academia Sinica; Shiao-Hong Hwang, Hung-YunHsieh, Rung-Chiuan Yang, Bo-Ren Bai, Jia-Chi Weng,Yen-Ju Yang, Shi-Wei Lin, National Taiwan University;Keh-Jiann Chen, Chiu-Yu Tseng, Lin-Shan Lee, AcademiaSinica (REPUBLIC OF CHINA)

Complete Recognition of Continuous Mandarin

Speech for Chinese Language with Very Large

Vocabulary but Limited Training Data 61

Hsin-min Wang, Jia-lin Shen, Yen-Ju Yang, National

Taiwan University; Chiu-yu Tseng, Academia Sinica,Lin-shan Lee, National Taiwan University (REPUBLICOF CHINA)

Developments in Continuous Speech Dictation

Using the ARPA WSJ Task 65

J.L. Gauvain, L. Lamel, M. Adda-Decker, LIMSI-CNRS

(FRANCE)

Recent Improvements to the ABBOT LargeVocabulary CSR System 69

M.M. Hochberg, Cambridge University; S.J. Renals,

University ofSheffield; A.J. Robinson, G. D. Cook,

Cambridge University (ENGLAND)

The 1994 HTK Large Vocabulary SpeechRecognition System 73P. C. Woodland, C. J. Leggetter, J. J. Odell, V.

Valtchev, S.J. Young, Cambridge University (UK)

Tangerine: A Large Vocabulary Mandarin

Dictation System 77

Yuqing Gao, Hsiao-Wuen Hon, Zhiwei Lin, Gareth

Loudon, S. Yogananthan, Baosheng Yuan, National

University ofSingapore (SINGAPORE)

ASR SYSTEM & CORPORA

Chair: John Bridle, Diagem (USA)

WSJCAMO: A British English Speech Corpus for

Large Vocabulary Continuous Speech Recognition 81

Tony Robinson, Jeroen Fransen, David Pye, Jonathan

Foote, Steve Renals, Cambridge University (UK)

Voice Across Hispanic America: A TelephoneSpeech Corpus of American Spanish 85Yeshwant Muthusamy, Edward Holliman, Barbara

Wheatley, Texas Instruments; Joseph Picone,Mississippi State University; John Godfrey, UniversityofPennsylvania (USA)

Implementation of the POW (Phonetically OptimizedWords) Algorithm for Speech Database 89

Yeonja Lim, Youngjik Lee, ETR1 (KOREA)

Microsoft Windows Highly Intelligent SpeechRecognizer: Whisper 93

Xuedong Huang, Alex Acero, Fil Alleva, Mei-Yuh

Hwang, Li Jiang, Milind Mahajan, MicrosoftCorporation (USA)

Concept-Based Speech Translation 97L. Mayfield, M. Gavalda, W. Ward, A. Waibel,

Carnegie Mellon University (USA)

PhoneBook: A Phonetically-Rich Isolated-Word

Telephone-Speech Database 101John F. Pitrelli, Cynthia Fong, Suk H. Wong, Judith R.

Spitz, Hong C. Leung, NYNEXScience & Technology,Inc. (USA)

CTIMIT: A Speech Corpus for the CellularEnvironment with Applications to Automatic

Speech Recognition 105

Kathy L. Brown, E. Bryan George, Lockheed Sanders,Inc. (USA)

Toward Movement-Invariant Automatic Lip-

Reading and Speech Recognition 109Paul Duchnowski, University ofKarlsruhe

(GERMANY); Martin Hunke, Carnegie Mellon

University (USA); Dietrich Btlsching, Uwe Meier,

University ofKarlsruhe (GERMANY); Alex Waibel,

Carnegie Mellon University (USA)

Some Results with a Trainable Speech Translation

and Understanding System 113

V.M. Jimenez, A. Castellanos, E. Vidal, Universidad

Politecnica de Valencia (SPAIN)

A Continuous Speech Recognition System UsingFinite State Network and Viterbi Beam Search for

the Automatic Interpretation 117

Nam-Yong Han, Hoi-Rin Kim, Kyu-Woong Hwang,

Young-Mok Ahn, Joon-Hyung Ryoo, £77?/ (KOREA)

ROBUST SPEECH RECOGNITION

Chair: Richard Stern, Carnegie Mellon University (USA)

Robust Speech Recognition Based on Stochastic

Matching 121Ananth Sankar, SRI International; Chin-Hui Lee, AT&T

Bell Laboratories (USA)

On the Robustness of Linear Discriminant Analysis asa Preprocessing Step for Noisy Speech Recognition 125Olivier Siohan, CRIN-CNRS & INRIA - Lorraine (FRANCE)

A Maximum Likelihood Procedure for a Universal

Adaptation Method Based on HMM Composition 129Yasuhiro Minami, Sadaoki Furui, NTT Human InterfaceLaboratories (JAPAN)

A Fast and Flexible Implementation of ParallelModel Combination 133

M. J. F. Gales, S. J. Young, Cambridge University (U.K.)

Multivariate-Gaussian-BasedCepstralNormalization for Robust Speech Recognition 137Pedro J. Moreno, Bhiksha Raj, Evandro Gouvea,Richard M. Stern, Carnegie Mellon University (USA)

Robust Speech Recognition in Noise UsingAdaptation and Mapping Techniques 141Leonardo Neumeyer, Mitchel Weintraub, SRIInternational (USA)

Noisy Speech Recognition Using Robust Inversionof Hidden Markov Models 145

Seokyong Moon, Jenq-Neng Hwang, University ofWashington (USA)

Rapid Environment Adaptation for Robust SpeechRecognition 149Keizaburo Takagi, Hiroaki Hattori, Takao Watanabe,NEC Corporation (JAPAN)

Noise Estimation Techniques for Robust Speech

Recognition 153

H.G. Hirsch, C. Erlicher, Aachen University ofTechnology (GERMANY)

Pole-Filtered Cepstral Mean Subtraction 157

Devang Naik, Rutgers University (USA)

Discourse Structure for Multi-Speaker SpontaneousSpoken Dialogs: Incorporating Heuristics intoStochastic RTNs 177

Sheryl R. Young, Carnegie Mellon University (USA)

Improved Backing-Off for M-Gram LanguageModeling 181Reinhard Kneser, Philips GmbH Research Laboratories,Hermann Ney, RWTHAachen, University ofTechnology(GERMANY)

QWI: A Method for Improved Smoothing in

Language ModellingG. Bordel, I. Torres, Universidad del Pais Vasco; E.

Vidal, Universidad Politecnica de Valencia (SPAIN)

185

Using a Stochastic Context-Free Grammar as a

Language Model for Speech Recognition 189

Daniel Jurafsky, University ofCalifornia at Berkeley;Chuck Wooters, Department ofDefense; Jonathan Segal,

University ofCalifornia atBerkeley; Andreas Stolcke,SRI International; Eric Fosler, University ofCalifornia at

Berkeley; Gary Tajchman, Voice Processing Corporation;Nelson Morgan, University ofCalifornia at Berkeley (USA)

Improved Language Modeling by UnsupervisedAcquisition of Structure 193

Klaus Ries, Finn Dag Bu0, University ofKarlsruhe

(GERMANY); Ye-Yi Wang, Carnegie Mellon University

(USA); Alex Waibel, University ofKarlsruhe (GERMANY)

LANGUAGE MODELING

Chair: Roni Rosenfeld, Carnegie Mellon University (USA)

Language Model Adaptation via Minimum

Discrimination Information 161

P. Srinivasa Rao, Michael D. Monkowski, Salim

Roukos, IBM T J. Watson Research Center (USA)

Clustering Word Category Based on Binomial

posteriori Co-occurence Distribution 165

Masafumi Tamoto, Takeshi Kawabata, NTT Basic

Research Labs (JAPAN)

Language Modeling by Variable Length Sequences:Theoretical Formulation and Evaluation of

Multigrams 169

Sabine Deligne, FredeYic Bimbot, Telecom Paris

(FRANCE)

An Integrated Grammar/Bigram LanguageModel Using Path Scores

Harvey Lloyd-Thomas, Jerry H. Wright, EnsigmaLimited; Gareth J.F. Jones, University ofCambridge

(UK)

173

Understanding Referring Expressions in a Person-

Machine Spoken Dialogue 197

Claudia Pateras, Gregory Dudek, Renato De Mori,McGill University (CANADA)

USE OF KNOWLEDGE IN ASR

Chair: Jim Glass, Massachusetts Institute ofTechnology,LCS (USA)

Analysis of Acoustic-Phonetic Variations in Fluent

Speech Using TIMIT 201

Don X. Sun, State University ofNew York (USA), Li

Deng, University ofWaterloo (CANADA)

Analyzing Weaknesses of Language Models for


Joerg P. Ueberla, DRA Malvern (UK)

A Hidden Markov Model with Optimized Inter-

Frame Dependence 209

F. J. Smith, J. Ming, P. O'Boyle, A.D. Irvine, The

Queen's University (N. IRELAND)

On the Use of Scaler Quantization for Fast HMM

Computation 213

Shigeki Sagayama, Satoshi Takahashi, NTTHuman

Interface Laboratories (JAPAN)

Large-vocabulary Speech Recognition in SpecializedDomains 217

Haakon Chevalier, Chuck Ingold, Carol Kunz, ChipMoore, Crispen Roven, Jon Yamron, Bradley Baker,Paul Bamberg, Sarah Bridle, Tracy Bruce, Amy Weader,

Dragon Systems, Inc. (USA)

Understanding and Improving Speech RecognitionPerformance through the Use of Diagnostic Tools 221

Ellen Eide, Herbert Gish, Philippe Jeanrenaud, Angela

Mielke, BBN Systems and Technologies (USA)

Phrase Bigrams for Continuous Speech Recognition 225

Egidio P. Giachin, CSELT (ITALY)

Using Explicit Segmentation to Improve HMMPhone Recognition 229Carl D. Mitchell, Mary P. Harper, Leah H. Jamieson,Purdue University (USA)

Viterbi Algorithm for Acoustic Vectors Generated

by a Linear Stochastic Differential Equation onEach State 233

Marco Saerens, Universite Libre de Bruxelles,

(BELGIUM)

Non Deterministic Stochastic Language Models

for Speech Recognition 237G. Riccardi, E. Bocchieri, R. Pieraccini, AT&T BellLaboratories (USA)

TOPICS IN SPEECH CODING

Chair: Peter Kroon, AT&TBell Laboratories (USA)

Improving 16 kb/s G.728 LD-CELP Speech Coderfor Frame Erasure Channels 241

Craig R. Watkins, Juin-Hwey Chen, AT&T BellLaboratories (USA)

Reconstruction of Missing Packets for CELP-Based

Speech Coders 245Aamir Husain, Vladimir Cuperman, Simon Fraser

University (CANADA)

A Robust Variable-Rate Speech Coder 249A. Shen, B. Tang, A. Alwan, G. Pottie, University ofCalifornia - Los Angeles(USA)

Wideband Speech Coding Using MultipleCodebooks and Glottal Pulses 253

C. McElroy, B.P. Murray, A.D. Fagan, UniversityCollege - Dublin (IRELAND)

Speech Coding Using ISI Coded Quantization 257

Nam Phamdo, SUNY; Cheng-Chieh Lee, University

ofMaryland; Rajiv Laroia, AT&T Bell Laboratories

(USA)

New Techniques for Multi-prototype Waveform

Coding at 2.84 kb/s 261I.S. Burnett, G.J. Bradley, University of Wollongong(AUSTRALIA)

Quantization of Non-Linear Predictors in SpeechCoding 265Jes Thyssen, Henrik Nielsen, Tele Danmark Research,Steffen Duus Hansen, Technical University ofDenmark

(DENMARK)

A Fast Robust Stochastic Algorithm for Vector

Quantizer Design for Nonstationary Channels 269

B. K5vesi, S. Saoudi, J.M. Boucher, ENST/Bretagne,(FRANCE); Z. Reguly, Technical University ofBudapest (HUNGARY)

Voice Quality of Interconnected PCS, JapaneseCellular, and Public Switched Telephone Networks 273

Spiros Dimolitsas, Franklin L. Corcoran, Channasandra

Ravishanker, COMSAT Laboratories; Marion Baraniecki,INTELSAT (USA)

Objective Speech Measure for Chinese in WirelessEnvironment 277

K.H. Lam, O.C. Au, C.C. Chan, K.F. Hui, S.F. Lau,

Hong Kong University ofScience & Technology(HONG KONG)

WORDSPOTTING, REJECTION, AND

TOPIC IDENTIFICATION

Chair: Jay Wilpon, AT&T Bell Laboratories (USA)

A Training Procedure for Verifying StringHypotheses in Continuous Speech Recognition 281R.C. Rose, B.H. Juang, C.H. Lee, AT&T BellLaboratories (USA)

Robust Utterance Verification for Connected

Digits Recognition 285Mazin G. Rahim, Chin-Hui Lee, Biing-llwang Juang,A T&T Bell Laboratories (USA)

A Hybrid Wordspotting Method for SpontaneousSpeech Understanding Using Word-Based Pattern

Matching and Phoneme-BasedHMM 289

Hiroshi Kanazawa, Mitsuyoshi Tachimori, Yoichi

Takebayashi, Toshiba Corporation (JAPAN)

Acoustic and Language Modeling ofHuman and

Nonhuman Noises for Human-to-Human

Spontaneous Speech Recognition 293

T. Schultz, I. Rogina, University ofKarlsruhe

(GERMANY) and Carnegie Mellon University (USA)

LVCSR Log-Likelihood Ratio Scoring for Keyword

Spotting 297

Mitchel Weintraub, SRI International (USA)

Keyword Spotting Using Supervised/Unsupervised

Competitive Learning 301

Chakib Tadj, Franck Poirier, Telecom Paris (FRANCE)

A Continuous Density Neural Tree Network

Word Spotting System 305

Stephen V. Kosonocky, IBM T.J. Watson Research

Center; Richard J. Mammone, Rutgers University (USA)

Video Mail Retrieval: The Effect of Word Spotting

Accuracy on Precision 309

G.J.F. Jones, J.T. Foote, K. Sparck Jones, S.J. Young,Cambridge University (UK)

Improved Topic Spotting through Statistical

Modelling of Keyword DependenciesJerry H. Wright, Michael J. Carey, Eluned S. Parris,

Ensigma Limited (UK)

313

Topic Focusing Mechanism for Speech RecognitionBased on Probabilistic Grammar and Topic

Markov Model 317

Takeshi Kawabata, NTT Basic Research Labs (JAPAN)

The Effects of Telephone Transmission Degradationson Speaker Recognition Performance 329D.A. Reynolds, M.A. Zissman, T.F. Quatieri, G.C.

O'Leary, B.A. Carlson, MITLincoln Laboratory (USA)

Covariance Estimation Methods for Channel

Robust Text-Independent Speaker Identification 333Michael Schmidt, Herbert Gish, Angela Mielke, BBN

Systems and Technologies (USA)

Channel and Noise Compensation for Text

Dependent Speaker Verification Over Telephone 337William Y. Huang, ITTAerospace Communications;Bhaskar D. Rao, University ofCalifornia (USA)

Testing with the Yoho CD-ROM Voice Verification

Corpus 341

Joseph P. Campbell, Jr., U.S. Department ofDefense(USA)

An Orthogonal Polynomial Representation of Speech

Signals and Its Probabilistic Model for Text

Independent Speaker Verification 345

Chi-Shi Liu, Ministry ofTransportation and

Communications; Hsiao-Chaun Wang, National Tsing

Hua University (TAIWAN); Frank K. Soong, AT&T

Bell Laboratories (USA); Chao-Shih Huang, Ministry

of Transportation and Communications (TAIWAN)

Text-Dependent Speaker Verification Using Data

Fusion 349

Kevin R. Farrell, Dictaphone Corporation (USA)

Neural Net Approaches to Speaker Verification:

Comparison with Second Order Statistic Measures

M. Mehdi Homayounpour, CNRS/URA, (FRANCE);

Gerard Chollet, IDIAP (SWITZERLAND)

353

A Subword Neural Tree Network Approach to Text-

Dependent Speaker Verification 357

Han-Sheng Liou, Richard J. Mammone, Rutgers

University (USA)

SPEAKER RECOGNITION

Chair: S. ?nrthasmthy, AT&TBell Laboratories (USA)

The Influence of Noise on the Speaker Recognition

Performance Using the Higher Frequency Band 321

Shoji Hayakawa, Fumitada Itakura, Nagoya University

(JAPAN)

Measuring Fine Structure in Speech: Application

to Speaker Identification

C.R. Jankowski, Jr., T.F. Quatieri, D.A. Reynolds,

MIT Lincoln Laboratory (USA)

325

RECOGNITION: FEATURE ANALYSIS

Chair: Shigeki Sagayama, NTT (JAPAN)

Statistical Modeling of Speech Feature Vector

Trajectories Based on a Piecewise Continuous

Mean Path 361

Mark M. Thomson, University ofAuckland (NEW

ZEALAND)

Trace-Segmentation of Isolated Utterances for

Speech Recognition 365Euvaldo F. Cabral, Jr., University ofSao Paulo,

(BRAZIL); Graham D. Tattersall, University ofEast

Anglia (UK)

Optimal Linear Feature Transformations for

Semi-Continuous Hidden Markov Models 369

E. Gtinter Schukat-Talamazzini, Joachim Hornegger,Heinrich Niemann, Universitdt Erlangen-Nurnberg(GERMANY)

Use of Generalized Dynamic Feature Parametersfor Speech Recognition: Maximum Likelihoodand Minimum Classification Error Approaches 373

C. Rathinavelu, L. Deng, University of Waterloo

(CANADA)

A Statistical Pattern Recognition Approach toRobust Recursive Identification of Non-stationaryAR Model of Speech Production System 377Milan Z,. Markovic, Institute ofApplied Math andElectronics, Branko D. Kovacevic, University ofBelgrade, Milan M. Milosavljevic, Institute ofAppliedMath andElectronics (YUGOSLA VIA)

The NP Speech Activity Detection AlgorithmJoseph Pencak, Douglas Nelson, Department of

Defense (USA)

Speech Analysis Based on Malvar WaveletTransform

Christophe Ris, Vincent Fontaine, Henri Leich,Faculte Polytechnique de Mons (BELGIUM)

Magnitude Spectral Estimation via PoissonMoments with Application to Speech RecognitionSamel Celebi, Jose C. Principe, University ofFlorida(USA)

381

Improved Speech Modeling and Recognition UsingMulti-dimensional Articulatory States as Primitive

Speech Units 385L. Deng, J. Wu, H. Sameti, University ofWaterloo(CANADA)

389

393

397Stochastic Perceptual Models of SpeechNelson Morgan, University ofCalifornia at Berkeley,(USA); Hervd Bourlard, Faculte Polytechnique(BELGIUM); Steven Greenberg, University ofCalifornia atBerkeley; Hynek Hermansky, Oregon Graduate Institute,Su-Lin Wu, University ofCalifornia at Berkeley (USA)

TOPICS IN NOISE AND RECOGNITION

Chair: Yariv Ephraim, George Mason University (USA)

Auditory Scene Analysis and Hidden MarkovModel Recognition of Speech in Noise 401P.D. Green, M.P. Cooke, M.D. Crawford, UniversityofSheffield (UK)

Speech Enhancement Based on Temporal

Processing 405Hynek Hermansky, Eric A. Wan, Carlos Avendano,Oregon Graduate Institute ofScience & Technology (USA)

A Comparative Study of Mel Cepstra and EIH forPhone Classification under Adverse Conditions 409

Sumeet Sandhu, Oded Ghitza, AT&T Bell Laboratories

(USA)

Supplementary Orthogonol Cepstral Features 413Khaled T. Assaleh, Motorola, GSTG (USA)

Subband Analysis for Robust Speech Recognitionin the Presence of Car Noise 417

Engin Erzin, Bilkent University; A. Enis Cetin, Kog

University; Yasemin Yardimci, Bogazici University(TURKEY)

Robust Speech Feature Extraction Using SBCOR

Analysis 421

Shoji Kajita, Fumitada Itakura, Nagoya University(JAPAN)

Methods for Improved Speech Recognition Over

Telephone Lines 425Alfred Hauenstein, Erwin Marschall, Siemens AG

(GERMANY)

New HOS-Based Parameter Estimation Methods

for Speech Recognition in Noisy Environments 429Asunci6n Moreno, Sergio Tortola, Josep Vidal,Jose* A. R. Fonollosa, Universitat Politecnica de

Catalunya (SPAIN)

Noise Compensation for Speech Recognition in CarNoise Environments 433

Ruikang Yang, Petri Haavisto, Nokia Research Center

(FINLAND)

Speech Recognition in Impulsive Noise 437S.V. Vaseghi, B.P. Milner, University of East Anvjia(UK)

RECOGNITION: TRAINING TECHNIQUES SPEECH CODING BELOW 4 KB/S

Chair: Robin Rohlicek, BBN, Inc. (USA)

Speaker-independent Phone Modeling Based onSpeaker-dependent HMMs' Composition and

Clustering 441Tetsuo Kosaka, Shoichi Matsunaga, A 77? InterpretingTelecommunications Research Laboratories; Mikio

Kuraoka, Toyohashi University ofTechnology (JAPAN)

Using Morphology Towards Better Large-Vocabulary Speech Recognition Systems 445P. Geutner, Universitat Karlsruhe (GERMANY)

Optimal Splitting of HMM Gaussian Mixture

Components with MMIE Training 449Yves Normandin, Centre de Recherche informatiquede Montreal (CANADA)

Dictionary Learning: Performance through

Consistency 453Tilo Sloboda, Universitat Karlsruhe (GERMANY)

Incremental MAP Estimation ofHMMs for

Efficient Training and Improved Performance 457Yoshihoko Gotoh, Brown University (USA); MichaelM. Hochberg, Cambridge University (UK); Daniel J.

Mashao, Harvey F. Silverman, Brown University(USA)

Discrete MMI Probability Models for HMM


J.T. Foote, Cambridge University (UK)

Global Discrimination for Neural Predictive

Systems Based on N-Best Algorithm 465

Abdelhamid Mellouk, LRI, UA 410 CNRS; Patrick

Gallinari, LAFORIA, UA CNRS 1095 (FRANCE)

Enhancement of Discriminative Capabilities ofHMM Based Recognizer through Modification

of Viterbi Algorithm 469

Jianming Song, The University ofWoolongong(AUSTRALIA)

A Generalization of the Baum Algorithm to

Functions on Non-linear Manifolds 473

D. Kanevsky, IBM T.J. Watson Research Center (USA)

Data-Driven Codebook Adaptation in Phonetically

Tied SCHMMs

Thomas Kemp, Universitat Karlsruhe (GERMANY)

All

Chair: Thomas E. Tremain, U.S. Department ofDefense(USA)

NATO STANAG 4479: A Standard for an 800 bpsVocoder and Channel Coding in HF-ECCM System 480B. Mouy, P. de La Noue, G. Goudezeune, ThomsonCSF-RGS (FRANCE)

Harmonic and Noise Coding of LPC Residuals withClassified Vector Quantization 484

Masayuki Nishiguchi, Jun Matsumoto, Sony Corporation(JAPAN)

Progress Towards a New Government Standard2400 bps Voice Coder 488M.A. Kohler, L.M. Supplee, T.E. Tremain, U.S.

Department ofDefense (USA)

Variable Dimension Spectral Coding of Speech at2400 bps and Below with Phonetic Classification 492Amitava Das, Allen Gersho, University ofCalifornia -Santa Barbara (USA)

Spectral Excitation Coding of Speech At 2.4 kb/s 496V. Cuperman, P. Lupini, B. Bhattacharya, SimonFraser University (CANADA)

A Robust 2400 bps Subband LPC Vocoder 500

P. A. Laurent, P. de La Noue, Thomson CSF-RGS

(FRANCE)

Band-Widened Harmonic Vocoder at 2 to 4 kbps 504

Gao Yang, G. Zanellato, Lernout & Hauspie SpeechProducts; H. Leich, Faculte Polytechnique de Mons

(BELGIUM)

A Speech Coder Based on Decomposition ofCharacteristic Waveforms 508

W. Bastiaan Kleijn, Jesper Haagen, AT&TBellLaboratories (USA)

Speech Compression Using Pitch SynchronousInterpolation 512

R. Taori, R.J. Sluijter, E. Kathmann, Philips Research

Laboratories (THE NETHERLANDS)

Pitch-Synchronous Multi-Band (PSMB) Speech

Coding 516

Haiyun Yang, Soo-Ngee Koh, Pratab Sivaprakasapillai,

Nanyang Technological University (SINGAPORE)

RECOGNITION: MODELING

STRUCTURES

RECOGNITION: SEARCH TECHNIQUES

Chair: Hsiao-Wuen Hon, Apple ISS Research Centre,

University ofSingapore (SINGAPORE)

Four-Level Tied-Structure for Efficient

Representation of Acoustic Modeling 520Satoshi Takahashi, Shigeki Sagayama, NTT Human

Interface Laboratories (JAPAN)

Application of Clustering Techniques to Mixture

Density Modelling for Continuous-SpeechRecognition 524Christian Dugast, Peter Beyerlein, Reinhold Haeb-

Umbach, Philips Research Laboratories (GERMANY)

Context Dependent Phonetic Duration Models for

Decoding Conversational Speech 528Michael D. Monkowski, Michael A. Picheny, P.Srinivasa Rao, IBM- TJ Watson Research Center

(USA)

A Unified Way in Incorporating Segmental Featureand Segmental Model into HMM 532Jun He, Henri Leich, Faculte Polytechnique de Mons

(BELGIUM)

Experimental Evaluation of Segmental HMMs 536

Wendy J. Holmes, Martin J. Russell, DRA Malvern(UK)

Improved Acoustic Modeling for SpeechRecognition Using 2D Markov Random Fields 540Helmut Lucke, ATR/ITL (JAPAN)

Structured Markov Models for Speech Recognition 544F. Wolfertstetter, G. Ruske, Munich University ofTechnology (GERMANY)

Robust Parametric Modeling of Durations inHidden Markov Models 548David Burshtein, Tel-Aviv University (ISRAEL)

Improved Decision Trees for Phonetic Modeling 552Roland Kuhn, Ariane Lazarides, Yves Normandin,Julie Brousseau, CRIM (CANADA)

High Speed Speech Recognition Using Tree-Structured Probability Density Function 556Takao Watanabe, Koichi Shinoda, Keizaburo Takagi,Ken-ichi Iso, NEC Corporation (JAPAN)

Chair: Al Alewa, Microsoft Corporation (USA)

A Fast Segmental Viterbi Algorithm for LargeVocabulary Recognition 560P. Laface, C. Vair, Politechnico di Torino, L. Fissore,CSELT (ITALY)

Searching with a Transcription Graph 564Z. Li, P. Kenny, D. O'Shaughnessy, Universite du

Quebec (CANADA)

On the Use of Stochastic Inference Networks for

Representing Multiple Word Pronunciations 568Renato De Mori, Charles Snow, Michael Galler,McGill University School ofComputer Science(CANADA)

A Tree Search Strategy for Large-VocabularyContinuous Speech Recognition 572P.S. Gopalakrishnan, L.R. Bahl, IBM; R.L. Mercer,Renaissance Technologies (USA)Lattice-Based Search Strategies for LargeVocabulary Speech Recognition 576F. Richardson, M. Ostendorf, Boston University; J.R.

Rohlicek, Bolt Beranek & Newman, Inc. (USA)

On Using a priori Segmentation of the SpeechSignal in an N-Best Solutions Post-processing 580T. Moudenc, D. Jouvet, J. Monne\ France Telecom

(FRANCE)

Time-Synchronous Continuous Speech RecognizerDriven by a Context-Free Grammar 584Tohru Shimizu, ATR/ITL; Seikou Monzen, YamagataUniversity; Harald Singer, Shoichi Matsunaga, ATR/ITL

(JAPAN)

Language Model Representations for Beam-SearchDecoding 588Giuliano Antoniol, Fabio Brugnara, Mauro Cettolo,Marcello Federico, I.R.S.T. (ITALY)

A Lower-Complexity Viterbi Algorithm 592Sarvar Patel, Bellcore (USA)

Efficient Search Using Posterior Phone ProbabilityEstimates 596

Steve Renals, University ofSheffield, Mike Hochberg,University of Cambridge (UK)

PROSODY FOR SYNTHESIS &

RECOGNITIONSPEECH SYNTHESIS & PRODUCTION

Chair: Yoshinori Sagisaka, AT&TBell Laboratories (USA)

Timing Patterns in Fluent and Disfluent SpontaneousSpeech 600Douglas O'Shaughnessy, Universite du Quebec (CANADA)

Stochastic Modeling of Pause Insertion UsingContext- Free Grammar 604

Shigeru Fujio, Yoshinori Sagisaka, Norio Higuchi,ATR-ITL (JAPAN)

Automatic Classification of Pitch Movements via

MLP-Based Estimation of Class Probabilities 608

Louis F. M. ten Bosch, Institute for PerceptionResearch, (THE NETHERLANDS)

On the Effects of Speech Rate in Large Vocabulary

Speech Recognition Systems 612Matthew A. Siegler, Richard M. Stern, CarnegieMellon University (USA)

A Prosodic Model of Mandarin Speech and Its

Application to Pitch Level Generation for Text

-to-Speech 616

Shaw-Hwa Hwang, Sin-Homg Chen, National Chiao

Tung University (REPUBLIC OF CHINA)

Prosodic Cues to Word UsageKaren Ward, David G. Novick, Oregon Graduate

Institute ofScience & Technology(USA)

620

Automatic Prosodic Segmentation by F0 Clustering

Using Superpositional Modeling 624Mitsuru Nakai, Tohoku University, Harald Singer,Yoshinori Sagisaka, ATR-ITL; Hiroshi Shimodaira,JAIST (JAPAN)

Duration Modeling in Large Vocabulary Speech

Recognition 628

Anastasios Anastasakos, Northeastern University,Richard Schwartz, Han Shu, BBNSystems and

Technologies (USA)

Speaker-Independent Automatic Classification of

Thai Tones in Connected Speech by Analysis-

Synthesis Method 632

Siripong Potisuk, Mary P. Harper, Jackson T. Gandour,

Purdue University (USA)

Chair: Kathleen Cummings, Georgia Institute ofTechnology (USA)

Speech Synthesis System Based on a Variable

Decimation/Interpolation Factor 636F. M. Gimenezde los Galanes, Univerisity Politecnicade Madrid; M. H. Savoji, Universidad de Cantabria;J. M. Pardo, Univerisity Politecnica de Madrid (SPAIN)

Automatic Speech Synthesiser Parameter Estimation

Using HMMs 640R.E. Donovan, P.C. Woodland, Cambridge University(UK)

Speaker Modification with LPC Pole Analysis 644Janet Slifka, Systems Research Laboratories, TimothyR. Anderson, Armstrong Laboratory (USA)

Synthesizing Styled Speech Using the Klatt

Synthesizer 648Janet C. Rutledge, Northwestern University; KathleenE. Cummings, Daniel A. Lambert, Mark A. Clements,

GeorgiaInstitute ofTechnology (USA)

Acoustical Measurements of the Vocal-Tract Area

Function: Sensitivity Analysis and Experiments 652

Hani Yehia, Nagoya University; Masaaki Honda, NTT

Basic Research Laboratories; Fumitada Itakura, Nagoya

University (JAPAN)

Shape-Invariant Pitch-Synchronous Text-to-

Speech Conversion 656

Eduardo R. Banga, Carmen Garcia-Mateo, Universidad

deVigo, (SPAIN)

Speech Parameter Generation from HMM Using

Dynamic FeaturesKeiichi Tokuda, Takao Kobayashi, Satoshi Imai,

Tokyo Institute ofTechnology (JAPAN)

660

A Source Generator Based Modeling Framework for

Synthesis of Speech Under Stress 664

Sahar E. Bou-Ghazale, John H. L. Hansen, Duke

University (USA)

MBE Synthesis of Speech Coded in LPC Format

K.F. Lam, C.F. Chan, City Polytechnic ofHong Kong

(HONGKONG)

668

Modeling Speech Production Using Yee's Finite

Difference Method 672

Kathleen E. Cummings, Georgia Institute ofTechnology;James G. Maloney, Georgia Technical Research Institute;

Mark A. Clements, Georgia Institute ofTechnology (USA)

SPEAKER ADAPTATION SPECTRAL QUANTIZATION

Chair: C.H. Lee, AT&TBell Laboratories (USA)

Batch, Incremental and Instantaneous AdaptationTechniques for Speech Recognition 676

G. Zavaliagkos, Northeastern University; R. Schwartz,J. Makhoul, BBN Systems and Technologies (USA)

Speaker Adaptation Using Combined Transformation

and Bayesian Methods 680

Vassilios Digalakis, Leonardo Neumeyer, SRIInternational (USA)

Rapid Speaker Adaptation Using Model Prediction 684S. M. Ahadi, P. C. Woodland, Cambridge University(UK)

Speaker Adaptation Based on Transfer Vector Field

Smoothing Using Maximum a posteriori ProbabilityEstimation 688

Masahiro Tonomura, Tetsuo Kosaka, Shoichi

Matsunaga, A TR Interpreting TelecommunicationsResearch Labs (JAPAN)

Experiments Using Data Augmentation for SpeakerAdaptation 692Jerome R. Bellegarda, Apple Computer Inc.; Peter V.de Souza, David Nahamoo, Mukund Padmanabhan,Michael A. Picheny, Lalit R. Bahl, IBM (USA)

Vector-Field-Smoothed Bayesian Learning forIncremental Speaker Adaptation 696Jun-ichi Takahashi, Shigeki Sagayama, NTTHumanInterface Laboratories (JAPAN)

A Speaker Adaptation Technique Using Linear

Regression 700S.J. Cox, University ofEast Anglia (UK)

Speaker Adaptation Based on Spectral Normalizationand Dynamic HMM Parameter Adaptation 704

Ming-Whei Feng, GTE Laboratories, Inc. (USA)

On-line Bayes Adaptation of SCHMM Parametersfor Speech Recognition 708Qiang Huo, Chorkin Chan, University ofHong Kong(HONGKONG)

Iterative Self-Learning Speaker and ChannelAdaptation under Various Initial ConditionsYunxin Zhao, University ofIllinois at Urbana-Champaign (USA)

712

Chair: Costas Xydeas, University ofManchester (UK)

Fast and Low-Complexity LSF Quantization UsingAlgebraic Vector Quantizer 716

Minjie Xie, Jean-Pierre Adoul, University ofSherbrooke(CANADA)

Low Cost Vector Quantization Methods for Spectral

Coding in Low Rate Speech Coders 720H.R. Sadegh Mohammadi, W.H. Holmes, UniversityofNew South Wales (AUSTRALIA)

Matrix Product Quantization for Very-Low-RateSpeech CodingStefan Bruhn, Technical University ofBerlin

(GERMANY)

724

An Intrinsically Reliable and Fast Algorithm to

Compute the Line Spectrum Pairs (LSP) in Low BitRate CELP Coding 728A. Goalie, S. Saoudi, ENST-Bretagne (FRANCE)

Spectral Dynamics Is More Important Than

Spectral Distortion 732H. Petter Knagenhjelm, W. Bastiaan Kleijn, AT&TBell Laboratories (USA)

Efficient Quantization of LSF Parameters UsingClassified SVQ Combined with Conditional

Splitting 736Dong-il Chang, Young-kwon Cho, Souguil Ann,Seoul National University (KOREA)

Efficient Coding of LSP Parameters Using SplitMatrix Quantisation 740C.S. Xydeas, C. Papanastasiou, University ofManchester (UK)

How Good Is Your p? Observations on VQTraining Ratios 744John S. Collura, Thomas E. Tremain, U.S. DepartmentofDefense (USA)

Variable Rate Spectral Quantization for PhoneticallyClassified CELP Coding 748Roar Hagen, Chalmers University oj'Technology;Erdai Paksoy, Allen Gersho, Universitv ofCalifornia(USA)

Optimal Distortion Measures for the High RateVector Quantization of LPC ParametersWilliam R. Gardner, University ofCalifornia-SanDiego; Bhaskar D. Rao, Qual Comm, Inc. (USA)

752

SPEECH ANALYSIS

Chair: Paul Mermelstein, INRS-Telecom (FRANCE)

Harmonics Tracking and Pitch Extraction Based onInstantaneous Frequency 756Toshihiko Abe, Takao Kobayashi, Satoshi Imai, TokyoInstitute of Technology (JAPAN)

Decomposition of Speech Signals into Deterministicand Stochastic Components 760C. d'Alessandro, LMSI-CNRS (FRANCE); B.Yegnanarayana, Indian Institute ofTechnology(INDIA); V. Darsinos, University ofPatras (GREECE)

Modeling and Processing Speech with Sums of AM-FM Formant Models 764

Shan Lu, Peter C. Doerschuk, Purdue University(USA)

On the Statistical Properties of Line Spectrum Pairs 768J.S. Erkelens, P.M.T. Broersen, Delft University ofTechnology (THE NETHERLANDS)

Individual Variations in Glottal Characteristics of

Female Speakers 772Helen M. Hanson, Harvard University (USA)

A Robust Method for Determining Instants of MajorExcitations in Voiced Speech 776B. Yegnanarayana, Indian Institute of Technology(INDIA); R.L.H.M, Smits, Institutefor PerceptionResearch (THE NETHERLANDS)

Interpolation of LPC Spectra via Pole Shifting 780Vladimir Goncharoff, Maureen Kaine-Krolak,

University ofIllinois at Chicago (USA)

Speech Formant Frequency and Bandwidth

Tracking Using Multiband Energy Demodulation 784Alexandras Potamianos, Petros Maragos, GeorgiaInstitute ofTechnology (USA)

Nonlinear Prediction for Speech Coding UsingRadial Basis Functions 788

Fernando Diaz-de-Maria, Universidafde Cantabria;Anfbal R. Figueiras-Vidal, Universidad Politecnica

de Madrid (SPAIN)

Recognition of Unvoiced Stops from Their Time-

Frequency Representation 792

Maria Rangoussi, Anastasios Delopoulos, NationalTechnical University ofAthens (GREECE)

SPEECH ENHANCEMENT & NOISE

REDUCTION

Chair: John H.L. Hansen, Duke University (USA)

Speech Enhancement Based on Masking Propertiesofthe Auditory System 796Nathalie Virag, Swiss Federal Institute of Technology(SWITZERLAND)

Optimizing Speech Enhancement by ExploitingMasking Properties of the Human Ear 800A. Akbari Azirani, R. Le Bouquin Jeannes, G. Faucon,Universite de Rennes I (FRANCE)

A Spectrally-Based Signal Subspace Approach for

Speech Enhancement 804Yariv Ephraim, Harry L. VanTrees, George MasonUniversity (USA)

Real-Time Implementation of HMM-Based MMSE

Algorithm for Speech Enhancement in HearingAid Applications 808H. Sheikhzadeh, Univerisity of Waterloo; R.L.Brennan, Unitron Industries, Ltd.; H. Sameti, UniversityofWaterloo (CANADA)

New Methods for Adaptive Noise Suppression 812Levent Arslan, Alan McCree, Vishu Viswanathan,Texas Instruments (USA)

Single-Sensor Speech Enhancement Using aSoft-Decision/Variable Attenuation Algorithm 816E. Bryan George, Lockheed Sanders, Inc. (USA)

Speech Enhancement Using a Ternary-DecisionBased Filter 820

T.S. Sun, S. Nandkumar, J, Carmody, J. Rothweiler,A. Goldschen, N. Russell, S. Mpasi, P. Green,Martin Marietta Laboratories (USA)

Signal Modeling Enhancements for Automatic

Speech Recognition 824Zaki B. Nossair, Peter L. Silsbee, Stephen A. Zahorian,OldDominion University (USA)

Co-Channel Speaker Separation 828

David P. Morgan, E. Bryan George, Texas Instruments;

Leonard T. Lee, LockheedSanders, Inc.; Stephen M.

Kay, University ofRhode Island (USA)

Speech Enhancement Based on the Generalized DualExcitation Model with Adaptive Analysis Window 832

Chang D, Yoo, Jae S. Lim, Massachusetts Institute ofTechnology (USA)

SPECIAL TOPICS IN SPEECH

RECOGNITION

Chair: K. Paliwal, Griffith University

Foreign Accent Classification Using Source Generator

Based Prosodic Features 836

John H. L. Hansen, Levent M. Arslan, Duke University

(USA)

Automatic Transcription of Unknown Words in a

Speech Recognition System 840R. Haeb-Umbach, P. Beyerlein, E. Thelen, PhilipsResearch Laboratories-Aachen (GERMANY)

An Evaluation of an Adaptive Multichannel Systemfor Speech Enhancement with Automatic Phase

Alignment 844

Silvana L. do N. Cunha Costa, Benedito G. AguiarNeto, Universidade Federal da Paraiba (BRAZIL)

Knowing Who to Listen to in Speech Recognition:

Visually Guided Beamforming 848Udo Bub, Martin Hunke, Alex Waibel, CarnegieMellon University (USA)

An N-Best Strategy, Dynamic Grammars and

Selectively Trained Neural Networks for Real-Time

Recognition of Continuously Spelled Names Over

the Telephone 852

Jean-Claude Junqua, Stephane Valente, Speech

TechnologyLaboratory, (USA); Dominique Fohr,Jean-Francois Mari, CRIN/INRIA, (FRANCE)

Language Models for a Spelled Letter Recognizer 856Martin Betz, Hermann Hild, Universitat Karlsruhe

(GERMANY)

Hands Free Continuous Speech Recognition in NoisyEnvironment Using a Four Microphone Array 860D. Giuliani, M. Matassoni, M. Omologo, P. Svaizer,IRST (ITALY)

A New Method for Automatic Generation of

Speaker-Dependent Phonological Rules 864Toru Imai, Akio Ando, Eiichi Miyasaka, NHK Science& Technology Research Labs (JAPAN)

Enhancing Automatic Speech Recognition with

an Ultrasonic Lip Motion Detector 868David L. Jennings, Dennis W. Ruck, AFIT/ENG (USA)

Classification and Clustering of Stop Consonantsvia Nonparametric Transformations and Wavelets 872Basilis Gidas, Brown University; Alejandro Murua,University ofChicago (USA)

'International Conference on Acoustics, Speech, and Signal ...The1995 International Conferenceon...

Documents

Transcript of 'International Conference on Acoustics, Speech, and Signal ...The1995 International Conferenceon...