Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of...

Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals • identification of a methodology for adapting linguistic resources for human-machine dialogue systems, without training corpora; •Multilinguality supported, uniform linguistic coverage of speech interpretation modules •case study: synchronising parser’s and SR interface for French for MIAMM project L.Romary*, A.Todirascu**, D.Langlois* *LORIA, Nancy, France **Université de Technologie de Troyes, France Speech Recognition’s Experiments • system: ESPERE (small vocabularies) • protocol: 88 sentences recorded, 4 speakers, 2 men and 2 females, no OOV, all sentences in language • several methods for estimating P(w|v) 1.Frequencies in training corpus: WER = 3.8, SD = 0.9 2.Uniform probabilities for bigrams in training corpus: WER = 4.0, SD = 0.4 • True frequencies are not useful 3.All bigrams are possible (not null probabilities, back- off with ): WER >> 4.0 • Constraint bigrams from grammar are necessary 4.Training corpus (A) is adaptation corpus for a general newspaper one (B): • Linear combination performance are less good than when A is used alone Parser’s Experiments • iteration updating process after interaction with other modules • adding new lexical entries • local grammars •generated by a meta-grammar •preference for substitution •Domain-specific syntactic components (elliptic phrases, navigation verbs, noun groups) • mapping lexical entries to domain ontology • transforming derivation trees into MMIL representations The Multimedia Information Access Using Multiple Modalities (MIAMM) Project (2002-2003) •A prototype of a human-machine dialogue interface regrouping various interaction modalities: speech, haptics, graphics; • Case study: searching music into an existing database using all the modalities • Multilinguality supported: English, French , German • difficulties: multi-modal training corpora not available, Information flow: several models for the same language (one for speech, one for parsing : how to cover the same language), changing specifications during the project Adapting parser’s resources •TAG widely accepted formalism for syntactic parsing •XML standard for grammars (TAGML) and for semantic representations (MMIL) •existing resources for English, French for free texts Speech recognizer’s language model •Statistical language model •400 words vocabulary •Huge number of possible sentences •Training corpus: generated with context-free grammar. Two steps: •Bigram model using classes (ex: P(DECADES| the) •Uniform distribution of words into classes: P(90’s|DECADES) User Scenarios •useful in the context of lacking real human- machine interactions; •designed to obtain homogeneous linguistic coverage, for all the languages: •several styles (or registers - familiar, elaborated); •specific phrases (politeness phrases, time intervals - "from the sixties"); •various syntactic components (passive constructions, relative clauses, questions and ellipses); •dates or names •developers worked independently •building exhaustive user scenarios Context-free grammar •covers the linguistic phenomena from the user scenario, for every language Technical vocabulary •covers the linguistic phenomena from the user scenario, for every language Developping the language resources

Upload
gary-hudson
Category

Documents
view
218
download
3

Embed Size (px):

Transcript of Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of...

Experiments on Building Language Resources for Multi-Modal Dialogue Systems

Goals• identification of a methodology for adapting linguistic resources for human-machine dialogue systems, without training corpora;

•Multilinguality supported, uniform linguistic coverage of speech interpretation modules

•case study: synchronising parser’s and SR interface for French for MIAMM project

L.Romary*, A.Todirascu**, D.Langlois* *LORIA, Nancy, France

**Université de Technologie de Troyes, France

Speech Recognition’s Experiments• system: ESPERE (small vocabularies)• protocol: 88 sentences recorded, 4 speakers, 2 men and 2 females, no OOV, all

sentences in language• several methods for estimating P(w|v)

1. Frequencies in training corpus: WER = 3.8, SD = 0.92. Uniform probabilities for bigrams in training corpus: WER = 4.0, SD = 0.4

• True frequencies are not useful3. All bigrams are possible (not null probabilities, back-off with ): WER >> 4.0

• Constraint bigrams from grammar are necessary4. Training corpus (A) is adaptation corpus for a general newspaper one (B):

• Linear combination performance are less good than when A is used alone

Parser’s Experiments• iteration updating process after interaction with other modules• adding new lexical entries• local grammars

•generated by a meta-grammar•preference for substitution•Domain-specific syntactic components (elliptic phrases, navigation verbs, noun groups)

• mapping lexical entries to domain ontology • transforming derivation trees into MMIL representations

The Multimedia Information Access Using Multiple Modalities (MIAMM) Project (2002-2003)•A prototype of a human-machine dialogue interface regrouping various interaction modalities: speech, haptics, graphics;

• Case study: searching music into an existing database using all the modalities

• Multilinguality supported: English, French , German

• difficulties: multi-modal training corpora not available, Information flow: several models for the same language (one for speech, one for parsing : how to cover the same language), changing specifications during the project

Adapting parser’s resources •TAG widely accepted formalism for syntactic parsing

•XML standard for grammars (TAGML) and for semantic representations (MMIL)

•existing resources for English, French for free texts

Speech recognizer’s language model•Statistical language model

•400 words vocabulary

•Huge number of possible sentences

•Training corpus: generated with context-free grammar. Two steps:

•Bigram model using classes (ex: P(DECADES|the)

•Uniform distribution of words into classes: P(90’s|DECADES)

User Scenarios•useful in the context of lacking real human-machine interactions;•designed to obtain homogeneous linguistic coverage, for all the languages:

•several styles (or registers - familiar, elaborated);•specific phrases (politeness phrases, time intervals - "from the sixties");•various syntactic components (passive constructions, relative clauses, questions and ellipses);•dates or names

•developers worked independently•building exhaustive user scenarios

Context-free grammar •covers the linguistic phenomena from the user scenario, for every language

Technical vocabulary•covers the linguistic phenomena from the user scenario, for every language

Developping the language resources