Agnieszka Wagner Department of Phonetics, Institute of Linguistics,
description
Transcript of Agnieszka Wagner Department of Phonetics, Institute of Linguistics,
Spoken Language Technologies:Spoken Language Technologies: A review of application areas A review of application areas
and research issuesand research issues
Analysis and synthesis of F0 contoursAnalysis and synthesis of F0 contours
Agnieszka Wagner
Department of Phonetics, Institute of Linguistics, Adam Mickiewicz University in Poznań
Humboldt-Kolleg, Słubice 13.-15. November 2008
Spoken Language Technologies: Introduction (1)
The need for and increasing interest in SLT systems: oral information is more efficient than a written
message speech is the easiest and fastest way of
communication (man – man, man – machine) Progress in the field:
technological advances in computer science
availability of specialized speech analysis and processing tools
collection and management of large speech corpora
investigation of acoustic dimensions of speech signals fundamental frequency (F0), duration, intensity and spectral characteristics
IntroductionIntroduction
Spoken Language Technologies: Introduction (2)
Speech synthesis (TTS, text-to-speech) systems
generate speech signal for a given input text
example: BOSS (Polish module developed at Dept. of Phonetics in cooperation with IKP, Uni Bonn)
ECESS (European Centre of Excellence in Speech Synthesis): standards of development of language resources, tools, modules and systems
Automatic speech recognition (ASR) systems
provide text of the input speech signal
example: Jurisdic (first Polish ASR system for needs of Police, Public Prosecutors and Administration of Justice)
The tasks of SLT systems (TTS and ASR)
Spoken Language Technologies: Application areas
Application areasApplication areas
Speech synthesis telecommunications (access to textual information over the
telephone) information retrieval measurement and control systems fundamental & applied research on speech and language a tool of communication e.g. for the visually handicapped
Speech recognition & related technologies text dictation information retrieval & management
man machine communication (together with speech synthesis): - dialogue systems,
- speech-to-speech translation,
- Computer Assisted Language Learning, CALL (e.g. the AZARAZAR tutoring system developed in the scope of the EURONOUNCE project)
Spoken Language Technologies: Performance of TTS and ASR systems
PerformancePerformance
Speech synthesis
high intelligibility and naturalness in limited domains (e.g. broadcasting news)
Speech recognition the best results for small vocabulary tasks
the state-of-the-art speaker-independent LVCSR systems achieve a word-error rate of 3%
Generally, the output quality is high as regards generation/recognition of the linguistic propositional content of speech
LimitationsLimitations
Spoken Language Technologies: Limitations of TTS and ASR systems
insufficient knowledge about methods for processing the non-verbal content of speech i.e. affective information – speaker’s attitude, emotional state, mood, interpersonal stances & personality traits
Speech synthesis
lack of variability in speaking style which encodes affective information can be detrimental to communication (e.g. in speech-to-speech translation)
data-driven approach to conversational, expressive speech synthesis is inflexible and quite costly
Speech recognition
transcription of conversational and expressive speech – substantially higher word-error rate
Humboldt-Kolleg, Słubice 13.-15. November 2008
ProgressProgress
the need of modeling the non-verbal content of speech i.e. affective information
Applications:
high-quality conversational and emotional speech synthesis (for dialogue or speech-to-speech translation systems)
commerce – monitoring of the agent-customer interactions, information retrieval and management (e.g. QA5)
public security, criminology – secured area access control (speaker verification), truth-detection invesitgation (e.g. Computer Voice Stress Analyzer, Layered Voice Analysis)
Spoken Language Technologies: Progress in the field (1)
Humboldt-Kolleg, Słubice 13.-15. November 2008
ProgressProgress
Spoken Language Technologies: Progress in the field (2)
Prosodic features: fundamental frequency (F0 – the central acoustic variable that underlies intonation), intensity, duration and voice quality ->
encoding and decoding of affective information
Emotion: Anger, Fear, Elationhigher mean F0higher F0 variabilityhigher intensityincreased speaking rate
Emotion: Sadness, Boredomlower mean F0lower F0 variabilitylower intensitydecreased speaking rate
Intonation models: hierarchical, sequential, acousitc-phonetic, phonological,
etc.
linguistic variation – well handled affective, emotional variation – unaccounted for
The comprehensive intonation model: Components
The comprehensive intonation model: ComponentsThe comprehensive intonation model: Components
a module of F0 contour analysis a module of F0 contour synthesis description of intonation
discrete tonal categories (higher-level, access to the meaning of the utterance)
acoustic parameters (low-level)
intonation description F0
generation(decoding)
analysis(encoding)
The comprehensive intonation model: Analysis and Synthesis
Automatic analysis of F0 contoursSummary
results comparable to inter-labeler consistency in manual annotation of intonation
high accuracy achieved using small vectors of acoustic features statistical modeling techniques application: 1) automatic labeling of speech corpora, 2) lexical &
semantic content, 3) ambiguous parses, 4) estimation of F0 targets
Automatic synthesis of F0 contoursSummary
estimation of F0 values with a regression model results comparable to those reported in the literature natural (similar to the original ones) F0 contours for synthesis of a
high quality and comprehensible speech (confirmed in perception tests)
Audio (1): Mean opinion in the perception test: no audible difference
The comprehensive intonation model: Synthesis example (1)
The comprehensive intonation model: Synthesis example (2)
Audio (2): Mean opinion in the perception test: very good quality
Humboldt-Kolleg, Słubice 13.-15. November 2008
Future researchFuture research
contribution from other knowledge domains (psychology)
affective speech data collection
classification of affective states
types of acoustic parameters
measurement of affective inferences
Spoken Language Technologies: Future research issues
Extensive and systematic investigation of the mechanisms in voice production and perception of affective speech:
THANK YOU FOR YOUR ATTENTION!THANK YOU FOR YOUR ATTENTION!