Agnieszka Wagner Department of Phonetics, Institute of Linguistics,

Spoken Language Technologies:Spoken Language Technologies: A review of application areas A review of application areas

and research issuesand research issues

Analysis and synthesis of F0 contoursAnalysis and synthesis of F0 contours

Agnieszka Wagner

Department of Phonetics, Institute of Linguistics, Adam Mickiewicz University in Poznań

Humboldt-Kolleg, Słubice 13.-15. November 2008

Spoken Language Technologies: Introduction (1)

The need for and increasing interest in SLT systems: oral information is more efficient than a written

message speech is the easiest and fastest way of

communication (man – man, man – machine) Progress in the field:

technological advances in computer science

availability of specialized speech analysis and processing tools

collection and management of large speech corpora

investigation of acoustic dimensions of speech signals fundamental frequency (F0), duration, intensity and spectral characteristics

IntroductionIntroduction

Spoken Language Technologies: Introduction (2)

Speech synthesis (TTS, text-to-speech) systems

generate speech signal for a given input text

example: BOSS (Polish module developed at Dept. of Phonetics in cooperation with IKP, Uni Bonn)

ECESS (European Centre of Excellence in Speech Synthesis): standards of development of language resources, tools, modules and systems

Automatic speech recognition (ASR) systems

provide text of the input speech signal

example: Jurisdic (first Polish ASR system for needs of Police, Public Prosecutors and Administration of Justice)

The tasks of SLT systems (TTS and ASR)

Spoken Language Technologies: Application areas

Application areasApplication areas

Speech synthesis telecommunications (access to textual information over the

telephone) information retrieval measurement and control systems fundamental & applied research on speech and language a tool of communication e.g. for the visually handicapped

Speech recognition & related technologies text dictation information retrieval & management

man machine communication (together with speech synthesis): - dialogue systems,

- speech-to-speech translation,

- Computer Assisted Language Learning, CALL (e.g. the AZARAZAR tutoring system developed in the scope of the EURONOUNCE project)

Spoken Language Technologies: Performance of TTS and ASR systems

PerformancePerformance

Speech synthesis

high intelligibility and naturalness in limited domains (e.g. broadcasting news)

Speech recognition the best results for small vocabulary tasks

the state-of-the-art speaker-independent LVCSR systems achieve a word-error rate of 3%

Generally, the output quality is high as regards generation/recognition of the linguistic propositional content of speech

LimitationsLimitations

Spoken Language Technologies: Limitations of TTS and ASR systems

insufficient knowledge about methods for processing the non-verbal content of speech i.e. affective information – speaker’s attitude, emotional state, mood, interpersonal stances & personality traits

Speech synthesis

lack of variability in speaking style which encodes affective information can be detrimental to communication (e.g. in speech-to-speech translation)

data-driven approach to conversational, expressive speech synthesis is inflexible and quite costly

Speech recognition

transcription of conversational and expressive speech – substantially higher word-error rate


ProgressProgress

the need of modeling the non-verbal content of speech i.e. affective information

Applications:

high-quality conversational and emotional speech synthesis (for dialogue or speech-to-speech translation systems)

commerce – monitoring of the agent-customer interactions, information retrieval and management (e.g. QA5)

public security, criminology – secured area access control (speaker verification), truth-detection invesitgation (e.g. Computer Voice Stress Analyzer, Layered Voice Analysis)

Spoken Language Technologies: Progress in the field (1)


ProgressProgress

Spoken Language Technologies: Progress in the field (2)

Prosodic features: fundamental frequency (F0 – the central acoustic variable that underlies intonation), intensity, duration and voice quality ->

encoding and decoding of affective information

Emotion: Anger, Fear, Elationhigher mean F0higher F0 variabilityhigher intensityincreased speaking rate

Emotion: Sadness, Boredomlower mean F0lower F0 variabilitylower intensitydecreased speaking rate

Intonation models: hierarchical, sequential, acousitc-phonetic, phonological,

etc.

linguistic variation – well handled affective, emotional variation – unaccounted for

The comprehensive intonation model: Components

The comprehensive intonation model: ComponentsThe comprehensive intonation model: Components

a module of F0 contour analysis a module of F0 contour synthesis description of intonation

discrete tonal categories (higher-level, access to the meaning of the utterance)

acoustic parameters (low-level)

intonation description F0

generation(decoding)

analysis(encoding)

The comprehensive intonation model: Analysis and Synthesis

Automatic analysis of F0 contoursSummary

results comparable to inter-labeler consistency in manual annotation of intonation

high accuracy achieved using small vectors of acoustic features statistical modeling techniques application: 1) automatic labeling of speech corpora, 2) lexical &

semantic content, 3) ambiguous parses, 4) estimation of F0 targets

Automatic synthesis of F0 contoursSummary

estimation of F0 values with a regression model results comparable to those reported in the literature natural (similar to the original ones) F0 contours for synthesis of a

high quality and comprehensible speech (confirmed in perception tests)

Audio (1): Mean opinion in the perception test: no audible difference

The comprehensive intonation model: Synthesis example (1)

The comprehensive intonation model: Synthesis example (2)

Audio (2): Mean opinion in the perception test: very good quality


Future researchFuture research

contribution from other knowledge domains (psychology)

affective speech data collection

classification of affective states

types of acoustic parameters

measurement of affective inferences

Spoken Language Technologies: Future research issues

Extensive and systematic investigation of the mechanisms in voice production and perception of affective speech:

THANK YOU FOR YOUR ATTENTION!THANK YOU FOR YOUR ATTENTION!

Agnieszka Wagner Department of Phonetics, Institute of Linguistics,

Documents

Transcript of Agnieszka Wagner Department of Phonetics, Institute of Linguistics,