Wolfgang Hess 60 years young Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam...

20
Wolfgang Hess 60 years young Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam Bonn, Sept. 29, 2000 Speech is beautiful

Transcript of Wolfgang Hess 60 years young Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam...

Page 1: Wolfgang Hess 60 years young Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam Bonn, Sept. 29, 2000 Speech is beautiful.

Wolfgang Hess 60 years young

Louis C.W. PolsInstitute of Phonetic Sciences

University of Amsterdam

Bonn, Sept. 29, 2000

Speech is beautifulSpeech is beautiful

Page 2: Wolfgang Hess 60 years young Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam Bonn, Sept. 29, 2000 Speech is beautiful.

IKP, Bonn IFA, Amsterdam

Page 3: Wolfgang Hess 60 years young Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam Bonn, Sept. 29, 2000 Speech is beautiful.

Speech is beautiful most natural form of communication it is efficient highly complex and challenging towards multi- and interdisciplinary communities natural speech synthesis full knowledge ASR lasting challenge speech is extremely robust to distortions speech is eloquent; singing; speeches are awful speech community is nice

Page 4: Wolfgang Hess 60 years young Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam Bonn, Sept. 29, 2000 Speech is beautiful.

robustness to degraded speech

partly reversed speech(Saberi & Perrott, Nature, 4/99)

fixed duration segments time reversed orshifted in time

perfect sentence intelligibility up to 50 ms(demo: every 50 ms reversed original)

Page 5: Wolfgang Hess 60 years young Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam Bonn, Sept. 29, 2000 Speech is beautiful.

Wolfgang

engineer by training emphasis on signal processing (Münich) pitch-synchronous spectral analysis applied for phoneme and word recognition and for voice detection and pitch extraction speech synthesis (Bonn)

Page 6: Wolfgang Hess 60 years young Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam Bonn, Sept. 29, 2000 Speech is beautiful.

History, almost 30 yrs ago

7th International Congress on Acoustics 1971, Budapest, Hungary

first international (speech) conference Satellite Speech Symposium, Szeged Hess, “Grundfrequenzsynschrone digitale

Spektralanalyse von Sprachsignalen mit beliebig feiner Auflösung im Frequenzbereich”- also papers in German, and even in Russian- engineering interest in speech analysis- forthcoming specialization in sp. recogn. & pitch extr.

Page 7: Wolfgang Hess 60 years young Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam Bonn, Sept. 29, 2000 Speech is beautiful.

Budapest ICA

many influential people from international speech science community, already present there

topics at that time far away from our present interests in almost every respect:- topics and ambitions- approaches taken- type and size of data sets

see some names and topics (nostalgia!)

Page 8: Wolfgang Hess 60 years young Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam Bonn, Sept. 29, 2000 Speech is beautiful.

speech processing Velichko (Russia): dynamic programming Bishnu Atal: towards predictive coding Sakoe (Japan): dynamic processing for time

normalization Osamu Fujimura:

- dynamic palatography,

- electromyography (hooked-wire electrodes),

- computer-controlled dynamic radiography

(Tokio x-ray microbeam generator) Jim Flanagan: focal points in sp. comm. research

Page 9: Wolfgang Hess 60 years young Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam Bonn, Sept. 29, 2000 Speech is beautiful.

speech synthesis Cecile Coker: articulatory synthesis Paul Mermelstein & Bishnu Atal: vocal transfer

functions for speech synthesis Johan Liljencrants: formant synthesis OVE III Helmut Mangold: synthesis with a limited set of

dynamic transitions Werner Endress: synthesis via intermediate sounds Peter Denes: word concatenation Fujimura, Coker & Umeda: prosody in synthesis Larry Rabiner: 2-pole digital filters for synthesis

“we were away a year ago”

Page 10: Wolfgang Hess 60 years young Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam Bonn, Sept. 29, 2000 Speech is beautiful.

speech recognition Hans Tillmann (abs.): Bonner DAWID-II-system Kasya, Kido, Krause & Tarnóczy: vowel recogn. Velichko: 60 words Rao: 225 VCV utterances, diad matching Sakoe: 2300 isolated Japanese 10 digits Dreyfus-Graf: artificial language Erman: 54 isolated words over telephone Neeley: 54 words recognition in noise Pols: 50 Dutch words, stationary phoneme parts Renato de Mori: zero crossings

Page 11: Wolfgang Hess 60 years young Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam Bonn, Sept. 29, 2000 Speech is beautiful.

speech perception, musical acoustics, psycho-acoustics

Rao: plosive-vowel interaction Kozhevnikov: AM vowel-like stimuli Ludmilla Chistovich: vowel discrimination Johan Sundberg: pitch extraction of folk music Max Mathews: music synthesis Tammo Houtgast: lateral inhibition in psychoac. Evans & Wilson: neurophysiological evidence Bela Julesz: critical bands in vision and audition Egbert de Boer: reverse-correlation method

Page 12: Wolfgang Hess 60 years young Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam Bonn, Sept. 29, 2000 Speech is beautiful.

Wolfgang’s further carrier

Dissertation in 1972

“Digitale grundfrequenzsynchrone Analyse von Sprachsignale als Teil eines automatischen Spracherkennungssystems”

Masterpiece in 1983, 698-pages book

“Pitch determination of speech signals. Algorithms and devices”, published by Springer Verlag.

Chair in Phonetics in Bonn in 1986 publications, keynotes, conference organizer

Page 13: Wolfgang Hess 60 years young Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam Bonn, Sept. 29, 2000 Speech is beautiful.

ESCA/ISCA and Eurospeech

ESCA grounded in 1988 Joseph Mariani first president (1988-1993) Louis Pols 2nd president (1993-1997) Wolfgang final keynote at E’97 in Rhodes since Sept. 1997: Roger Moore president since death Christian Benoit (April 25, 1998)

Wolfgang secretary of ESCA since Eurospeech’99 in Budapest ISCA

Page 14: Wolfgang Hess 60 years young Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam Bonn, Sept. 29, 2000 Speech is beautiful.

ICA 1971

all speech analysis based on filters or formants LPC was about to be introduced all synthesis based on formant synthesis diphone concept did not yet exist virtually no attention for TTS synthesis-by-rule all speech recognition based on word-template

matching probabilistic approach yet unknown vocabulary size of the order of 50 words only

Page 15: Wolfgang Hess 60 years young Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam Bonn, Sept. 29, 2000 Speech is beautiful.

present-day synthesis

mainly corpus-based concatenative synthesis with non-uniform units (e.g., CHATR, Festival, Next-Gen, Laureate, Bonner system)

large storage, optimal search high naturalness and intelligibility but….one speaker, one style, one application room for further improvement

Page 16: Wolfgang Hess 60 years young Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam Bonn, Sept. 29, 2000 Speech is beautiful.

possible improvements

general or application-specific corpus how to reduce storage requirements annotation details at various levels optimize search algorithms and cost functions fewer prototypes, generate certain variants preferable units, fall-back mechanism new voice, speaking style, emotion, rate can voice be personalized (cont.)

Page 17: Wolfgang Hess 60 years young Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam Bonn, Sept. 29, 2000 Speech is beautiful.

possible improvements (cont.)

how much manipulation in concatenation combining stored speech and synthetic speech better prosody (copy, concept, rules) intonation modelling (discrete or continuous;

detailed or sparse; signal oriented or linguistically meaningful)

concept for duration modelling sentence accent and prominence

Page 18: Wolfgang Hess 60 years young Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam Bonn, Sept. 29, 2000 Speech is beautiful.

presently not very popular

formant synthesis (but see MITalk) diphone and demisyllable synthesis (but see

many operational systems: Dutch Fluency, German Hadifix, Multi-lingual Lucent TTS)

use of forms of parameterized speech (as soon as more manipulation is required again)

many voices, speaking styles, emotions, rates importanc of system evaluation (Jenolan Caves)

Page 19: Wolfgang Hess 60 years young Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam Bonn, Sept. 29, 2000 Speech is beautiful.

future for Wolfgang

being in the midst of new and challenging developments

to produce (in the most efficient way) the highest achievable quality of synthetic speech (given specific dialogue applications) is a large responsibility but also a lot of fun to do (cont.)

Page 20: Wolfgang Hess 60 years young Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam Bonn, Sept. 29, 2000 Speech is beautiful.

future for Wolfgang (cont.) Wolfgang and the IKP group enjoy doing this for German and other languages and like to report about it at international

forums it attracts many good students these are excellent conditions for continuing

this work I wish Wolfgang and all his colleague a lot of success in the years to come!